| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mgradowski 1662 days ago

Also, you will sleep better at night knowing that your column dtypes are safe from harm, exactly as you stored them. Moving from CSV (or god forbid, .xlsx) has been such a quality of life improvement.

One thing I miss though is how easy it is to inspect .csv and .xlsx. I kinda solved it using [1], but it only works on Windows. More portable recommendations welcome!

[1] https://github.com/mukunku/ParquetViewer

3 comments

ZeroGravitas 1662 days ago

I really like Visidata for "exploring" csv-type data.

Its a vi(m) inspired tool.

It also handles xls(x), sqlitedb and a bunch of other random things, and it appears to support parquet via pandas:

https://www.visidata.org/docs/loading/

link

mgradowski 1662 days ago

Nice one!

link

speedgoose 1662 days ago

I used to use Zeppelin, some kind of Jupyter Notebook for Spark (that supports Parquet). But it may be better alternatives.

https://zeppelin.apache.org/

link

cb321 1662 days ago

The "real" format being binary with debugging tools is absolutely the best way to go. For example, you can use `nio print` (or even just `nio p`) in the Nim multi-program https://github.com/c-blake/nio to get "text debugging output" of binary files.

link