| HN Mirror

The RDBMS advantage is that you can update your records and you can append to them without having to rewrite the dataset. That makes ETL much easier. Eg recalculate a column. It’s also that referential constraints can make sure your database is coherent for you. This saves a lot of time and a lot of mistakes. You also get well thought through scheme management and other benefits besides. Pg11 will scale happily to 10x his requirement. I don’t see why you’d want to build infrastructure for the next 10 years on Spark... since Spark is unlikely to be the thing by then anyway.

I don’t know about cstore being slower at all at 100GB. Nor do I know that it matters for the use case. Spark runs like a dog on a single machine and requires far more resource to do so. PG also has options like pgstrom for gpu acceleration if speed is even s thing.

Also EtL is rarely written once ... it’s an ongoing body of work that changes as the data does.