| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by usgroup 2624 days ago

The RDBMS advantage is that you can update your records and you can append to them without having to rewrite the dataset. That makes ETL much easier. Eg recalculate a column. It’s also that referential constraints can make sure your database is coherent for you. This saves a lot of time and a lot of mistakes. You also get well thought through scheme management and other benefits besides. Pg11 will scale happily to 10x his requirement. I don’t see why you’d want to build infrastructure for the next 10 years on Spark... since Spark is unlikely to be the thing by then anyway.

I don’t know about cstore being slower at all at 100GB. Nor do I know that it matters for the use case. Spark runs like a dog on a single machine and requires far more resource to do so. PG also has options like pgstrom for gpu acceleration if speed is even s thing.

Also EtL is rarely written once ... it’s an ongoing body of work that changes as the data does.