|
|
|
|
|
by duijf
2445 days ago
|
|
Re availability: We had a hard time keeping the system based on Spark available. There were days when the cluster would freak out multiple times in a single day. The 'fix' would be: restart a bunch of spark workers. We spent a lot of time debugging/finding this out (some parts documented in [1]) but couldn't work out what the problem was. (EDIT: Assuming there even was a single problem.) In this particular case, I'd take the single point of failure over the previous situation. That being said: we have successfully used PostgreSQL's fail-overs multiple times. In my experience, they work quite alright. [1]: https://tech.channable.com/posts/2018-04-10-debugging-a-long... |
|
At $previous_job we had a "one service" = "one MySQL instance" policy. Every time a MySQL server would go down all clients would all lose access to that service at the same time. It was stressful and much less robust than your setup.