Hacker News new | ask | show | jobs
by blyry 2168 days ago
Pre 2.0 there were a few bugs with SRV discovery, maybe they adopted early and got bit? Just an anecdote but we've been using it since 1.9 without issue. Massively different scales though.

Pre k8s and before srv support we used consul template in prod as well but it always scared me, seemed like too many moving pieces for what should've been a simple system.

1 comments

I asked internally and figured out the gotcha that bit us: default dns payload size is 512b, which is enough for a few backend hosts but for sure not 12 or 30. Limit is 8kb, which probably wouldn't work for whatever slack is doing.

https://cbonte.github.io/haproxy-dconv/2.1/configuration.htm....

Because DNS records come back in random order for each response, those truncated dns responses caused the backend slots to constantly rotate between different pod instances. Haproxy was graceful about the rotations, but it showed up as suddenly very strange latency / perf numbers when a backend was scaled up to say 10 instances from the normal 3