Hacker News new | ask | show | jobs
by brunoqc 637 days ago
Thanks. My current pattern is to parse the files with rust, copy from stdin into a psql temp table, update the rows that have changed and delete the rows not existing anymore.

I'm hoping it's less wasteful than truncating and importing the whole table every time there is one single change.

1 comments

It probably is, but you can do that workflow with DuckDB too. It's just really flexible since it can all be done in SQL. Check out the DuckDB postgres connection (https://duckdb.org/docs/extensions/postgres.html)

To import from a Salesforce Bulk Export CSV (for example) into a postgres table is a few lines of SQL

``` INSERT INTO pgdb.accounts ( id, created_at, updated_at ) SELECT "Id", "CreatedDate"::TIMESTAMP, "UpdatedDate"::TIMESTAMP FROM read_csv('/path/to/file.csv') WHERE "Id" NOT IN (SELECT id FROM pgdb.accounts) ```

The path can be in a cloud storage provider as well, which is really nice. You can do updates which join from the postgres db or the other CSV files (or a MySQL database) as well. The data transforms (casting to timestamp, uuid, etc.) have been super handy along with all the other SQL niceties that you get