Hacker News new | ask | show | jobs
by gregw2 104 days ago
Not to mention that while Parquet fixes the "delimiter problem", it doesn't fix the "encoding problem".

In (simplistic) CSV, you have to pick the right delimiter or it mangles some of your data.

In Parquet you have to pick the right data type encodings for each column for your data or it gets mangled.

Your clean monetary fixed-precision decimal data from the source system becomes floating point slop in your "I didn't want to think about data types"-encoded Parquet file and then starts behaving differently (or even changing values!) due to the nature of floating point precision artifacts. Or your blanks become 0s or nulls, etc, etc.

And don't get me started on character set encodings!