|
|
|
|
|
by wenc
1284 days ago
|
|
> It's so complex to work with This is the opposite of my experience. > To read a parquet file in Python, you need Apache Arrow and Pandas. Or DuckDB. import duckdb
df = duckdb.query("select * from 'a.parquet'")
Want to look inside a Parquet file? Use Visidata. vd a.parquet
> I remember dealing with Parquet file for a job a while back and this same question came up: Why isn't there a simpler way, for when you're not in the data science stack and you just need to convert a parquet file to csv/json/read rows? Is is a limitation of the format itself?Do you consider Pandas a "data science" stack? To me, it's just a library like any other that makes it easy to work with tabular data. Even for CSV, there is csvreader (usually not a good idea to deal with CSV by hand). Outputting to CSV is literally a one liner in Pandas or DuckDB. import pandas as pd
# output to CSV
pd.read_parquet("a.parquet").to_csv("a.csv")
# output to JSON (choose from any number of orientations)
pd.read_parquet("a.parquet").to_json(orient="table")
# read rows
for row in pd.read_parquet("a.parquet").itertuples():
print(row)
|
|