| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by getly_store 101 days ago
	Yep. Heavy ETL often adds latency; a staging table plus COPY into Postgres, then idempotent upserts, is usually enough. Keep it incremental and observable: checksums, counts, and replayable loads. For bigger scales, add CDC (logical decoding like Debezium) and parallelize ingestion across partitions; minimize in-Python transforms and push work into SQL.

1 comments

Yep, that's the pattern I follow.

Unfortunately, the "ETL pipeline" I mentioned didn't even use transactions and was opening a new connection for every insert. No wonder it was slow.