| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by EdwardDiego 475 days ago

What query engine are you using?

Tends to be that an optimal file size for Parquet is about 1GiB, once again, the "many small files" problem of Hadoop remains.

Then it's things like, can you organise your data in such a way to take advantage of RLE etc.?

1 comments

Either Spark or Redshift (serverless)