Hacker News new | ask | show | jobs
by mgradowski 1662 days ago
Also, you will sleep better at night knowing that your column dtypes are safe from harm, exactly as you stored them. Moving from CSV (or god forbid, .xlsx) has been such a quality of life improvement.

One thing I miss though is how easy it is to inspect .csv and .xlsx. I kinda solved it using [1], but it only works on Windows. More portable recommendations welcome!

[1] https://github.com/mukunku/ParquetViewer

3 comments

I really like Visidata for "exploring" csv-type data.

Its a vi(m) inspired tool.

It also handles xls(x), sqlitedb and a bunch of other random things, and it appears to support parquet via pandas:

https://www.visidata.org/docs/loading/

Nice one!
I used to use Zeppelin, some kind of Jupyter Notebook for Spark (that supports Parquet). But it may be better alternatives.

https://zeppelin.apache.org/

The "real" format being binary with debugging tools is absolutely the best way to go. For example, you can use `nio print` (or even just `nio p`) in the Nim multi-program https://github.com/c-blake/nio to get "text debugging output" of binary files.