Hacker News new | ask | show | jobs
by spenczar5 2163 days ago
Networking in general is a far less sophisticated world than we might like to hope. You have to deal with quirks of vendor-specific firmware, creaky protocols, and so on, and the culture of networking has been a bit behind some other areas of software in embracing testing in the way you describe.

We'll get there, but it's no surprise CF isn't doing this today; it would put them waaaay ahead of the pack if they did.

1 comments

Nothing stops you from replicating your backbone network using a bunch of vMX VMs and testing your changes on it.

Would not catch weird firmware quirks in the real hardware, definitely would've caught this fat-finger typo.

Well, the thing that stops you is the cost of designing, implementing, maintaining, and scaling the replica testbed. On a large network, that would be pretty hard to justify to most organizations, which would see it as very costly with a tough-to-measure upside.

Have you done this before? I'd be interested to hear how those conversations went.

For an organization like CF? Yes, I would expect them to have testing and network simulations down to an art.

If I had to guess, I'd say it's because network engineers simply don't need / get this know-how on normal scale. Most SW developers on the other hand are not very good (good enough for CF scale) at networking. Which leads to networking guys doing their thing the way it was always done... (hope I didn't offend anyone, just guessing)

I hope they strengthen their dev department... I know I'd love a challenge like that. :)

This is understandable for most organisations but not networking centric businesses like Cloudflare.
I have to wonder if it would have. Unless you have some kind of route visibility collecting took, or a bunch of simulated traffic sufficient to pop the CPU on the vMX that represented atl01, it would all appear to work. I wonder if you could generate traffic, and scrape snmp counters as a proxy?

Or some kind of tool that processes the resultant routing tables to generate some kind of "route usage" for every given link and device, maybe even feed it with a table of expected traffic to given destinations.