Hacker News new | ask | show | jobs
by karlmdavis 3070 days ago
From personal experience: PostgreSQL's COPY commands aren't really all that performant, indexes or no.

Our project saw SIGNIFICANTLY better performance with batched multi-threaded INSERTs. If you can run a few hundred load threads and manage the concurrency correctly (not trivial), it will chew through big loads like a monster.

If I ever have the time/excuse, I want to go back and try a multi-threaded COPY. But if you need speed and have a choice between multi-threaded INSERTs or a single-threaded COPY, go with the INSERTs every time.

1 comments

COPY of course is single threaded. If you can split your input data and run multiple copy operations you can get similar increases although I would never suggest hundreds of threads. Depending on your IO system something like twice as many threads as CPU threads is probably going to work better.

On a one for one basis COPY IN will be faster than inserts:

- COPY uses a special optimization in the access method: instead of running the full insert logic (find a target page, lock it, insert, unlock page) per row, it batches all the rows that will fit on the target page.

- COPY overall has shorter code paths than regular inserts.

Why not just use regular batched inserts with unnest to turn several array parameters into a table instead of using some arcane hard to use SQL command?