|
|
|
|
|
by amypinka
2624 days ago
|
|
In benchmarks I've seen CStore is about 50% slower than Parquet on Spark. Where is the transactional requirement? This person is working with a copy of the real data. ETLs only need to be written once and if he decided on a PSQL approach he'd be writing ETLs to send the data there too. He's probably going to find a number of consistency problems so trying to normalise all this data again will just result in more work that won't make his team of DS' more productive. If he's at ~1 TB of data today, where will he be in a few years time? What's the point of putting infrastructure in place that won't last for the next 10+ years? |
|
I don’t know about cstore being slower at all at 100GB. Nor do I know that it matters for the use case. Spark runs like a dog on a single machine and requires far more resource to do so. PG also has options like pgstrom for gpu acceleration if speed is even s thing.
Also EtL is rarely written once ... it’s an ongoing body of work that changes as the data does.