|
|
|
|
|
by WookieRushing
1345 days ago
|
|
I was also surprised by this. 300K nodes for a distributed DB is kind of crazy. I’ve worked with similar systems but they stored much more than 100 PB with 10x less nodes Apple is using less than one TB per server… But when you see the 1000s of clusters it starts to make sense. They probably have a Cassandra cluster as their default storage for any use case and each one probably requires at least 3 nodes. They’re keeping the blast radius small of any issue while being super redundant. It probably grew organically instead of any central capacity management |
|
Half a TB per node, which during regular compaction can double. And if you went over, your CPU and disk spent so much time on overhead such as JVM garbage collection that your compaction processes backlog, your node goes slower and slower, your disk eventually fills up, and it falls over. Later things got better and you could use bigger nodes if you knew what you were doing and didn't trip over any of the hidden bottlenecks in your workload. Maybe even fixed in the last few versions of Cassandra 3x and 4.0.