Hacker News new | ask | show | jobs
by hot_gril 821 days ago
I remember using PG 10 at a previous company that was kinda abusing Postgres as a data processing tool with temp tables. Even with the parallel scans etc, we found it was a lot faster to split our queries (mostly INSERT(SELECT...)) into separate ones operating on separate ranges of rows, one for each CPU core. We'd run EXPLAIN to print out the plan then shard on the innermost or outermost join. I even implemented a huge sparse matrix addition/multiplication calculator this way, chaining multiple operations into a single huge query, far exceeding the limits of numpy. I've always wondered if Postgres could be used as a more efficient Spark backend.

It usually scaled linearly. We had a 32-core (64-vcore) server, saturating all cores and running a bit more than 32x as fast as a single query. In some cases, it was less than linear but much better than singular, and I think that was only cause of mistakes like uuid4 pkeys.