cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
411
Views
2
Helpful
2
Replies

How do you balance automation with manual control network management

Thomas Drennan
Level 1
Level 1

Hi everyone,

I’ve been working on improving our internal network automation setup, and one challenge I keep running into is deciding how much control to leave for manual adjustments.

Full automation sounds great on paper, but sometimes manual tweaks are still needed especially during troubleshooting or configuration rollbacks.

How do you handle this balance in your environment? Do you prefer full automation with monitoring alerts, or partial workflows where admins still confirm changes?

Would love to hear how others structure their network management process especially those using Cisco APIs or automation tools.

2 Replies 2

Jesus Illescas
Cisco Employee
Cisco Employee

I think there should always be possible to leave room for manual changes. In a past job with a Cloud provider, where all the network was automated, there were occasions where a manual change was required to fix a P0 or P1 incident, later the out-of-band change was merged into the automation. This was super strange and I saw it once or twice but the process was in place for this kind of situations.

Talking about Cisco products, NSO for example is introducing a confirm-network-state given this kind of scenarios happen. So a good practice would be to automate but have a process in place for manual changes and merge them back into the automation

@Thomas Drennan in my experience no one has a 100% automated network and at this stage locking out CLI or manual changes would not be advised (not that this is the question, but stating). Even most mature environments do not go full automation every where and they use a tiered method, this tiered automation is based on risk and frequency.

For example (not an exclusive list YMMV)

T1 ( no approval) - Any data reading/collection, monitoring, documentation

T2 (approval gates) - Config update, image updates, rollbacks

T3 (manual)  - Major architecture changes, funky show commands with tac, (control plane issues for example) P1 customer fixes

Monday morning lacking coffee top of mind thoughts - idempotency is crucial for reliable network automation, this reduces "did it actually apply?" confusion. When automation becomes a "magic black box" that works great 99% of the time but prevents you from understanding what's actually happening, you've created a few serious issues, this is what we cal the 'abstraction paradox'.

Back in the day when i had to put in manual/hot fixes, if these remained a perm fixture, they had to be written into the automation flow, of when the golden checks ran, this would jump out and flag as an issue later, or leave others confused when they saw this, not knowing why this was there.

Please mark this as helpful or solution accepted to help others
Connect with me https://bigevilbeard.github.io