| The basis of this, was being pointed in the right direction by the community. etcd had a HUGE issue with the implementation of the raft consensus algorithm they were using. This was in version 0.x The tough part was that, even though etcd 2.0 was released in January [1], it was not put into CoreOS alpha until April [2] After moving to 2.x - all my problems went away. It had a small learning curve of setting up lots of nodes in the cluster vs proxies [3]. 2.x had a lot of functionality added, but the main one for us was it's reliability. Being able to query status of members, add/remove members from the cluster and monitoring. Before etcd 2.x, the whole etcd infrastructure would die (and consequently, fleet) if just ONE node restarted. Needless to say, it's come a long way. We've been running etcd 2.x since January in a container [4], then just doing export FLEETCTL_ENDPOINT=http://127.0.0.1:2379 [1] - https://coreos.com/blog/etcd-2.0-release-first-major-stable-... [2] - https://coreos.com/blog/coreos-alpha-with-etcd-2/ [3] - https://coreos.com/etcd/docs/latest/admin_guide.html [4] - https://coreos.com/blog/Running-etcd-in-Containers/ |