| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by monkey26 4515 days ago
	This article caught my interest as I've been reading into Cassandra. But some previous research had me thinking that Cassandra works best with under a TB/node. Is SQL still better when you have really large nodes (16-32TB) and only really want to scale out for more storage? I'm currently humming along happily with Postgres, but some of the distributed features, and availability of Cassandra look really nice.

3 comments

jbellis 4515 days ago

Cassandra 2.0 can handle 5TB per node easily, 10TB with some care. Best to scale out, not up.

That said, if someone else has already made the hardware choice for you, you can always run multiple C* nodes on a single machine. I know several production clusters that fit this description.

link

olavgg 4514 days ago

PostgreSQL can handle petabytes easily. However if you need to query a petabyte of data, then you need to rethink your solution. PrestoDB + Hive + Hadoop may be what you need.

link

ddorian43 4514 days ago

so can you put petabytes on 1 server? or can postgresql shard?

link

krenoten 4514 days ago

It's much more about desired usage patterns than amount of storage. Cassandra and RDBMS's differ quite a lot in how you replicate, consistency guarantees, performant read patterns, performant write patterns, how you handle recovery, etc... If you intend to bring anything to scale it helps to understand the strengths and weaknesses of the underlying architecture.

link

cnlwsu 4514 days ago

We run at about 1TB a node and it works well (high write load things like metrics and telemetry data). But we also use SQL server where appropriate (i.e. transactional account stuff).

I am a fan of using the right tool for the right job providing you have the team to support it.

link