| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by WookieRushing 1392 days ago

I was also surprised by this. 300K nodes for a distributed DB is kind of crazy. I’ve worked with similar systems but they stored much more than 100 PB with 10x less nodes

Apple is using less than one TB per server…

But when you see the 1000s of clusters it starts to make sense. They probably have a Cassandra cluster as their default storage for any use case and each one probably requires at least 3 nodes. They’re keeping the blast radius small of any issue while being super redundant. It probably grew organically instead of any central capacity management

1 comments

stubish 1392 days ago

What you describe is best practice for older versions of Cassandra with older versions of the Oracle JVM on spinning disks. And at this time Apple already had a massive amount of Cassandra nodes. Back when 1TB disks were what we had started buying for our servers. Cassandra was designed to run on large numbers of cheap x86 boxes, unlike most other DBs where people had to spend hundreds of thousands or millions of dollars on mainframes and storage arrays to scale their DBs to the size they needed.

Half a TB per node, which during regular compaction can double. And if you went over, your CPU and disk spent so much time on overhead such as JVM garbage collection that your compaction processes backlog, your node goes slower and slower, your disk eventually fills up, and it falls over. Later things got better and you could use bigger nodes if you knew what you were doing and didn't trip over any of the hidden bottlenecks in your workload. Maybe even fixed in the last few versions of Cassandra 3x and 4.0.

link

WookieRushing 1392 days ago

What psaux mentioned makes more sense. A node == one Cassandra agent instead of a server.

Past 100k servers you start needing really intense automation just to keep the fleet up with enough spares.

If you’ve got say 10k servers it’s much more manageable

The fun thing is Cassandra was born at FB but they don’t run any Cassandra clusters there anymore. You can use lots of cheap boxes but at some point the failure rate of using soo many boxes ends up killing the savings and the teams.

link

stubish 1391 days ago

Yes, you can run multiple nodes on a single physical server. However, then you have the additional headache of ensuring that only one copy of data gets stored on that physical server, or else you can lose your data if that server dies. Similar to having multiple nodes backed by the same storage system, where you need to ensure losing a disk or volume doesn't lose two or more copies of data. Cassandra lets you organize your replicas into 'data centers', and some control inside a DC by allocating nodes to 'racks' (with some major gotchas when resizing, so not recommended!). Translating that into VMs running on physical servers and shared disk is (was?) not documented.

link

rockwotj 1392 days ago

> The fun thing is Cassandra was born at FB but they don’t run any Cassandra clusters there anymore.

Isn't Intragram mostly Cassandra?

https://instagram-engineering.com/open-sourcing-a-10x-reduct...

link

WookieRushing 1392 days ago

It wasn’t when I last saw it. Rocksandra ended up being a stepping stone to fbs most common distributed db, zippydb https://engineering.fb.com/2021/08/06/core-data/zippydb/

Zippydb is honestly one of the best parts of fb infra. It let you select levels of consistency vs latency

link

petersellers 1391 days ago

> Zippydb is honestly one of the best parts of fb infra. It let you select levels of consistency vs latency

How is that different from Cassandra's Tunable consistency model?

https://cassandra.apache.org/doc/4.1/cassandra/architecture/...

link

shrubble 1391 days ago

Seagate introduced 2TB drives no later than 2010.

link

stubish 1391 days ago

Interestingly, using the highest capacity drives at any point in time would work even worse since they spun slower and slower sequential write speed. If you could get them from your preferred vendor, which seemed to be several years after introduction for us!

link