|
|
|
|
|
by elmolino89
1088 days ago
|
|
xsv is great for a quick sanity checks (i.e. number of columns, unique values counts in a given column) but for a more serious tasks/giant files I switch to either polars or duckdb converting CSV/TSV files to parquet or parquet data sets. By giant I mean 25G gzipped files with >10^9 rows like these VCFs: https://ftp.ncbi.nlm.nih.gov/snp/latest_release/VCF/ |
|
sqlp: Run blazing-fast Polars SQL queries against several CSVs - converting queries to fast LazyFrame expressions, processing larger than memory CSV files.
to: Convert CSV files to PostgreSQL, SQLite, XLSX, Parquet and Data Package.
[1] https://github.com/jqnatividad/qsv/blob/master/src/cmd/sqlp....
[2] https://github.com/jqnatividad/qsv/blob/master/src/cmd/to.rs...