|
|
|
|
|
by getly_store
101 days ago
|
|
Yep. Heavy ETL often adds latency; a staging table plus COPY into Postgres, then idempotent upserts, is usually enough. Keep it incremental and observable: checksums, counts, and replayable loads. For bigger scales, add CDC (logical decoding like Debezium) and parallelize ingestion across partitions; minimize in-Python transforms and push work into SQL. |
|
Unfortunately, the "ETL pipeline" I mentioned didn't even use transactions and was opening a new connection for every insert. No wonder it was slow.