| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by adammarples 750 days ago

Yeah I'm not sure about redshift, but bigquery uses "autodetect" so something like

bq load --autodetect --source_format=CSV mydataset.mytable ./myfile.csv

And snowflake uses INFER_SCHEMA I believe you can do this

select * from table( infer_schema( location=>'@stage/my file.csv', file_format=>'my_csv_format' ) );

Although tbh I'm not sure if that's what you're looking for. You might enjoy looking at duckdb for stuff like this. My policy when starting data engineering was to bung everything into pandas dataframes, and now my policy is to try to avoid them at all costs because they're slow and memory hungry!

1 comments

qoega 750 days ago

In ClickHouse it is just `INSERT INTO t FROM INFILE 'data.csv.gz'`. Any supported format, any encryption, autodetected from file name and sample data piece to get column types, delimeters etc. Separate tools to convert CSV are not necessary if you can just import to db and export as SQL Statements.

echo "name,age,city John,30,New York Jane,25,Los Angeles" > example.csv

clickhouse local -q "SELECT * FROM file('example.csv') FORMAT SQLInsert" INSERT INTO table (`name`, `age`, `city`) VALUES ('John', 30, 'New York'), ('Jane', 25, 'Los Angeles');