Hacker News new | ask | show | jobs
by p33p 1558 days ago
> On the other hand parquet becomes a turtle if one tries to squeeze i.e. 12k numerical columns into it.

I thought parquet was columnar stored? Is this a fault of parquet or just the shear number of columns trying to get accessed?

I agree with your general premise though. I'd rather take a dirty dataset, throw it into S3, spin up a Redshift cluster, do what I need, spin down the cluster. You can work with billions of records fairly easily with plain old SQL and c-store databases.