|
|
|
|
|
by efuquen
4032 days ago
|
|
I had way more trouble with Consul then I ever had with etcd. Actually, I've had almost no trouble with etcd whatsoever, it was way more resilient and tolerant to dying machines then Consul was, which would repeatedly get into inconsistent states and attempt to connect to nodes that were no longer there. I've been running CoreOS on production with dozens of AWS instances over the past year and I really don't have many complaints. Most issues I've come across actually have a lot more to do with docker then the stuff CoreOS has built. But I also handle upgrading releases differently, that's not something I trusted from the beginning and it's easy enough to disable their update system and stand up new instances with upgraded CoreOS images. Also, looking at your quote I would consider it very out of context, the previous sentence right before that: "If there were any changes to these etcd machines, AWS would reboot them to apply the changes, potentially all at the same time." So they had Cloudformation potentially rebooting all there machines at the same time, I think any cluster is going to have an issue when that happens and really has nothing to do with CoreOS's update system. |
|
> Also, looking at your quote I would consider it very out of context, the previous sentence right before that:
> "If there were any changes to these etcd machines, AWS would reboot them to apply the changes, potentially all at the same time."
>So they had Cloudformation potentially rebooting all there machines at the same time, I think any cluster is going to have an issue when that happens and really has nothing to do with CoreOS's update system.
Two things:
(a) I'm saying that etcd has a tendency to break in _exactly_ _the_ _same_ _way_ without AWS rebooting anything.
(b) Production systems have a tendency to fail completely and all power on (or experience the end of a network partition) at the same time. It is absolutely necessary for anything as essential as etcd claims to be to be able to deal with a situation where all machines are powered off or unreachable to each other, and that comes to a sudden end.
CoreOS's update system happens to trigger this on its' own, because when it updates, it relies on etcd.
If you're not going to rely on CoreOS to update itself, what in the world is the point of CoreOS?
I'm just saying there are other boxes of sticks, putting some sticks in a box ain't that fuckin' hard, and these particular stick-gatherers are suffering from a dangerous bout of megalomania.
Feel free to lean your livelihood up against whatever box of sticks you please. :)