Hacker News new | ask | show | jobs
by meritt 2220 days ago
pandas + fastparquet fairly lightweight. but yes I would love to see a simple c++/golang binary that's just a simple csv2parq call.
1 comments

Newer versions of Pandas don't even need fastparquet anymore. This code works:

import pandas as pd

df = pd.read_csv('data/us_presidents.csv')

df.to_parquet('tmp/us_presidents.parquet')

Nice! Does that work alongside reading in via chunks and writing via row_groups? If I have a 500GB CSV will it work?