|
|
|
|
|
by nemothekid
9 days ago
|
|
>Fortunately, we still have /etc/hosts, which we can easily provision This is the kind of thing you read in a post-mortem and wonder how they designed something so fiendishly wonderful. At 2:00am our MySQL master failed and failed over successfully to our secondary server. As part of post-failover ops, ansible playbook proceeded to login to 1000 instances to update the hosts file for the new master. This caused traffic amplication which caused our Etcd nodes to believe they were down. As the etcd nodes failed over, our ansible playbook proceeded to then login to 1000 instances to update the hosts file... Honestly, whatever system you built is justing do the same exact function as DNS just with extra steps. If you squint really hard /etc/hosts is your local dns cache and ansible is your resolver. I think this kind of "simplification fetishization" is dangerously attractive to people who have only managed relatively simply setups. I don't think anyone who has ever had to deal with high-availability failover would consider Ansible a good solution. The problem that so many people hit with DNS isn't specific to DNS the protocol - it's the problem of service discovery. This architecture doesn't eliminate service discovery, it just moves it to a far more brittle configuration. |
|
The application would log into every router in the network and run a massive, on the fly script to manually create a bunch of PPPOE services, shaping targets for those connections, update firewall rules etc.
It would also run manual mikrotik bandwidth tests across every logical link it was aware of.
The application developers were adamant that this was the best way of doing things, and any disagreement would have them point at their dozen or so customers and boast that they surely wouldnt have been able to hoodwink that many people if they were doing it wrong.
Anyway we took a packet capture of all the every 10 minute script updates and demonstrated those to the customer as a whole number % of their bandwidth to certain smaller sites, and also were able to show them how they stopped getting "My internet goes out every 10 minutes" complaints as we turned off the automatic mikrotik bandwidth tests running every 10 minutes.
But to save their customer the application developers agreed to implement SNMP and RADIUS but they never did. IIRC their fee was a flat 15% of all profits generated by the customer, which was just staggering. And the fee could rise if they asked for support.