Hacker News new | ask | show | jobs
by throw0101a 1233 days ago
> That's the norm in network ops. Automated testing is pretty much impossible, easy rollback may be possible depending on exactly what was screwed, but not always.

Ansible/Napalm is a thing in NetOps in some places. Some folks use Eve-ng / GNS3 to spin up virtual networks to test config changes, and it may be possible to do CI/CD changes if you track things in a repo.

Juniper JunOS has auto-rollback if you don't confirm the change after "x" minutes:

* https://www.juniper.net/documentation/us/en/software/junos/c...

So if you did something that causes breakage and disconnection from the router, you (ideally) don't have to do anything but wait it out.

2 comments

Emulating even a mid-sized network in GNS3 requires massive resources, and my cisco account manager doesn't seem to even get why I'd want to deploy a test system of 50 different multi-vendor switches (and key supporting services like syslog and tacacs) with terraform, run some tests, apply a configuration change, and run more tests.

And virtual switches aren't the same as physical switches in any case, they have different bugs, different features, different responsiveness.

commit confirmed is such a life-saver. I ran a production network which spanned multiple continents and even though I probably only ever actually needed commit confirmed a single digit number of times, the fact that it was there made every change I did 99% less stressful. I knew that even if I made a mistake, all I had to do was wait 5-10 minutes and it would all revert.

Compare this to my cisco/foundry/other experience where I would delay changes until I was in the office (physically colocated with main routers) or calling people to be onsite for what was 99% of the time an innocuous change. The stress of it led to me deferring changes or just skipping them entirely which led to more issues/stress/etc.

I'm really not sure there is a single software feature which improved my life as much as "commit confirmed"

So instead of one ripple across your BGP network, you have two as it rollsback the change?

The problem is that the state in routing tables isn't stored in a single location, it's dynamically built over time. Breaking a single router in the wrong way can break the state, and there's no rollback of that state

> So instead of one ripple across your BGP network, you have two as it rollsback the change?

It's possible, it depends on what the nature of the change is. If you use super short commit confirmed intervals (commit confirmed 1) then yes you can cause a situation where you revert a "good" commit and cause a second disturbance. You need to intelligently reason about commit confirmed times to consider this when you're making such changes.