|
|
|
|
|
by wenc
687 days ago
|
|
There are folks who still directly query CSV formats in a data lake using a query engine like Athena or Spark or Redshift Spectrum — which ends up being much slower and consuming more resources than is necessary due to full table scans. CSV is only good for append only. But so is Parquet and if you can write Parquet from the get go, you save on storage as well has have a directly queryable column store from the start. CSV still exists because of legacy data generating processes and dearth of Parquet familiarity among many software engineers. CSV is simple to generate and easy to troubleshoot without specialized tools (compared to Parquet which requires tools like Visidata). But you pay for it elsewhere. |
|