Hacker News new | ask | show | jobs
by bogomipz 3527 days ago
The OP states under "Why not Consul"

"Consul agent takes part in Raft, so doing infrastructure discovery in your whole environment means every system takes part in Raft"

Consul agent runs in two modes - master and client. From the Consul Docs:

"Client - A client is an agent that forwards all RPCs to a server. The client is relatively stateless. The only background activity a client performs is taking part in the LAN gossip pool. This has a minimal resource overhead and consumes only a small amount of network bandwidth."

The client doesn't take part in the consensus protocol. This seems to be the OPs whole reason for choosing etcd over Consul and I don't believe the OPs claim is correct.

2 comments

I personally think the OPs reason to use etcd is mainly about 2 parts: 1.) The ecosystem it integrates in 2.) Only maintaining 1 SR technology

I think that with the upcoming kubernetes trend, many OPs guys are somehow familiar with the CoreOS stack where etcd fits in very great. The hashicorp toolchain is great, but I've seen many OPs guys only using terraform or vault e.g. and don't build an integrated environment with nomad or atlas.

The CoreOS stacks I've seen are more integrated in platforms than the hashicorp ones (can be a good and a bad thing at once).

>"The hashicorp toolchain is great, but I've seen many OPs guys only using terraform or vault e.g. and don't build an integrated environment with nomad or atlas"

I think this is one of the nice parts about Hashicorp releases though is that they are building blocks and you can choose to use them as standalone pieces - Consul, subsets Consul + Vault or of them in an integrated stack such Nomad and Atlas.

I agree with you though if you are a CoreOS shop it certainly makes sense to choose etcd.

You are correct and I came here to say the same. Additionally I've seen issues working with etcd in that it is a pain to manage, with situations coming about of split cluster issues where the data becomes corrupted and needs to be manually merged together. That is a bad situation to be in, and usually it is better to develop for failure, especially in cloud environments. You can get split-brain problems on any clustered agreement replication, etcd is not a special case.
These failures are the worst case for any technology like that and will always cause a bit pain in production - but since the etcd guys a very open to how they behave in case of failures and how they do quality testing of etcd, I started to trust etcd: https://coreos.com/blog/new-functional-testing-in-etcd/

I have read a blog post about continous testing of consul at high scale too (can't find the link for now), but I can't remember to got this much insights.

There is a dashboard for publishing the testing result at realtime at http://dash.etcd.io/dashboard/db/functional-tests. The result is not super clear right now, we are improving it.