| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nemothekid 56 days ago

>Fortunately, we still have /etc/hosts, which we can easily provision

This is the kind of thing you read in a post-mortem and wonder how they designed something so fiendishly wonderful.

At 2:00am our MySQL master failed and failed over successfully to our secondary server. As part of post-failover ops, ansible playbook proceeded to login to 1000 instances to update the hosts file for the new master. This caused traffic amplication which caused our Etcd nodes to believe they were down. As the etcd nodes failed over, our ansible playbook proceeded to then login to 1000 instances to update the hosts file...

Honestly, whatever system you built is justing do the same exact function as DNS just with extra steps. If you squint really hard /etc/hosts is your local dns cache and ansible is your resolver. I think this kind of "simplification fetishization" is dangerously attractive to people who have only managed relatively simply setups. I don't think anyone who has ever had to deal with high-availability failover would consider Ansible a good solution.

The problem that so many people hit with DNS isn't specific to DNS the protocol - it's the problem of service discovery. This architecture doesn't eliminate service discovery, it just moves it to a far more brittle configuration.

4 comments

protocolture 56 days ago

I had an ISP customer years ago that had an AAA system designed by people who didnt understand DNS, DHCP or RADIUS. They also had no idea about netflow or SNMP.

The application would log into every router in the network and run a massive, on the fly script to manually create a bunch of PPPOE services, shaping targets for those connections, update firewall rules etc.

It would also run manual mikrotik bandwidth tests across every logical link it was aware of.

The application developers were adamant that this was the best way of doing things, and any disagreement would have them point at their dozen or so customers and boast that they surely wouldnt have been able to hoodwink that many people if they were doing it wrong.

Anyway we took a packet capture of all the every 10 minute script updates and demonstrated those to the customer as a whole number % of their bandwidth to certain smaller sites, and also were able to show them how they stopped getting "My internet goes out every 10 minutes" complaints as we turned off the automatic mikrotik bandwidth tests running every 10 minutes.

But to save their customer the application developers agreed to implement SNMP and RADIUS but they never did. IIRC their fee was a flat 15% of all profits generated by the customer, which was just staggering. And the fee could rise if they asked for support.

link

JdeBP 56 days ago

One does not even need to squint. The first page of RFC 882 explains outright that the DNS came about in the first place because the mechanisms for updating a HOSTS.TXT file and publishing it to loads of places did not scale.

That's still just as true for the intranets of the 2020s with thousands of machines all downloading a HOSTS file several times a day (or even hour/minute) as it was for the Internet of July 1983 with around 500 hosts that was merely downloaded by everyone a couple of times per week. The fact that a file can be copied faster now is counterbalanced by the fact that tying this to real-time failover means that it needs to be updated and distributed several orders of magnitude more quickly than it was in 1983 too. And that's ignoring the linear nature of a HOSTS file lookup contrasted with even the stupidest DNS implementation.

Those who think that HOSTS is a fallback for any sort of dynamic operation (into and out of service) of even hundreds of machines have not learned the history of why the DNS even exists.

link

adrian_b 55 days ago

This is true, but TFA argued that the network services themselves should not use /etc/hosts or other similar translation database, so a not yet updated file should not cause a network outage.

TFA proposed that /etc/hosts or the like should be used only for the benefit of administrators, to allow manual connections by name instead of by address, and presumably to make easy to interpret the activity logs. This is a desirable feature, but the network should work fine even when the name-to-address translation is temporarily unavailable, because of not-yet-updated /etc/hosts files.

Actually I have used for decades a system similar to what TFA proposes, avoiding to do DNS queries for the internal networks, while using my own DNS caching resolver for the Internet, but this was done only in relatively small networks, with a few hundred nodes at most, and where the IP addresses were changed infrequently. Thus I have no idea whether in a big network with frequently changed addresses there would be scaling problems.

link

nemothekid 56 days ago

>The first page of RFC 882 explains outright that the DNS came about in the first place because the mechanisms for updating a HOSTS.TXT file and publishing it to loads of places did not scale.

Great piece of history. The RFC is a bit older than I am so I've never studied it. Looking at it that way, then OP has just re-invented DNS.

link

sshine 55 days ago

And what a great invention DNS is.

If you need to eliminate DNS and convince the internet it's largely unnecessary for the use-cases we have today...

...only to completely reinvent DNS for those use-cases with inferior technology that eventually becomes DNS...

...then you have achieved wisdom. I applaud the author for being on this journey.

link

sshine 55 days ago

> Ansible

Shell scripts wrapped in YAML

link

trumpdong 55 days ago

You can, and we did, use VRRP, anycast, or similar protocols/techniques to have an IP address that is always the current MySQL master.

link