|
|
|
|
|
by ryanwaldorf
746 days ago
|
|
Fair callouts. For the first version I was looking to solve the basic use case of "I've downloaded a standard CSV from one database and I want to ingest it into another". I'll look at adding the ability to handle non-standard CSVs as well. Also I appreciate you pointing out what Snowflake and BigQuery have with this. I'm wondering if there's something Redshift-specific that may be useful here for larger file volumes. It is more performant to do a CSV copy, but it can be a PITA when it comes to having a role with the right permissions to copy data from the right location. Anything else you'd recommend I look into? |
|
bq load --autodetect --source_format=CSV mydataset.mytable ./myfile.csv
And snowflake uses INFER_SCHEMA I believe you can do this
select * from table( infer_schema( location=>'@stage/my file.csv', file_format=>'my_csv_format' ) );
Although tbh I'm not sure if that's what you're looking for. You might enjoy looking at duckdb for stuff like this. My policy when starting data engineering was to bung everything into pandas dataframes, and now my policy is to try to avoid them at all costs because they're slow and memory hungry!