Hacker News new | ask | show | jobs
by andy_ppp 3397 days ago
How well does, say, Postgres scale on such hardware? Is anything more that 8 cores overkill or can we assume good linear increases in queries per second...
2 comments

Depends on your queries. I am looking at a server right now that uses 80% of 32 cores with Postgres 9.6. It's doing lots of upserts and small selects. Averages 76k transactions per second. I think it could easily take advantage of a 64 core system.

The main scalability issue I have with Postgres is its horrible layout of data pages on disk. You can't order rows to be layed out on disk according to primary key. You can CLUSTER the table every now and then but that's not really practical for most production loads.

I think I saw a proposal recently for something that would cover this use case. IIRC it was for an index organized table that stores the entire contents in a btree (so it would naturally be stored in primary key order).

I don't think there's been any work on it yet though.

The main utility of CLUSTER is to mark a table as sorted - that is, few disc seeks will be required on an index-scan of the table. This is important when doing (for example) a merge join with another table, or just when you are requesting a large proportion of the table in sorted order. Postgres knows enough statistics about the order of entries in the table to know that it can read the table faster in order using the index with the occasional seek for recently added elements than if it was to do a sequential scan and re-sort the table in memory (spilling to disc if necessary).

A B-tree can in no terms be described as being laid out on disc in primary key order. The individual pages of the tree are placed on disc randomly, as they are allocated. Therefore an index scan won't return the rows in index order as quickly as the current scheme of having the rows separate from the index and sorting them every now and again.

Ultimately, for the goal of fast in-order scan of a table while adding/removing rows, you need the rows to be laid out on disc in that order, so that a sequential scan of the disc can be performed with few seeks. This requires that inserted rows are actually inserted in the space they should be, which is not always possible - often there isn't space in the page, and you don't want to spend lots of time shifting the rest of the rows rightwards a little bit to make space. To a certain extent Postgres already does insert in the right place if there is space in the right disc page (from deleted rows), but because this is not always possible, the solution is to re-CLUSTER the table every now and again.

I think the Postgres way is actually very well thought out.

Have you looked into pg_repack [1]? It's a PostgreSQL extension that can CLUSTER online, without holding an exclusive lock. I haven't used it, but it looks interesting as an alternative to the built-in CLUSTER.

[1] http://reorg.github.io/pg_repack/

This is from 2012: http://rhaas.blogspot.com/2012/04/did-i-say-32-cores-how-abo...

My guess is the 1 socket options scales great. 2 sockets are are less than ideal, and you will not double the 1 socket performance.

Yeah, every single release since then has had work done on scaling Postgres up on a single machine. They have been working on eliminating bottlenecks one after another to allow it to scale on crazy numbers of cores.

I've seen benchmarks on the -hackers mailing list with 88 core Intel servers (4s 22c) in regard to eliminating bottlenecks when you have that many cores. So even if it's not 100% there yet, it will be soon.

I'd like to see this data on Postgres scaling updated, with more info on the write scaling as well. (the chart appears to be for SELECT queries only)
The other change is now a single select can use multiple cores, so you could see how that scaled to 32, 64, 128 cores...
On highly concurrent PG systems by when using parallel queries you are sacrificing throughput for better latency. You really don't want to use more than a fairly small number of workers per single select.
Outside of some very exotic scenarios you are IOPS bound on writes and not CPU bound.
Is that still a problem with cheap NVMe drives that can do 500k IOPS?
Not a problem at all if you provision your server properly so you aren't IOPS bound, though there are plenty of databases which simply won't fit on NVMe drives, so in those cases, yes it could still be a problem.
Not likely to be a problem if you have FusionIO drives. Likely to be a problem sooner or later on everything else. A definitive recurrent issues on all cloud providers and NAS/network drives.
Nope but everyone is running in the "cloud" and there you are lucky to get 50K IOPS
This is one of the reasons I don't run in the cloud.
I've seen decent enough scaling on 8 socket servers. There's still some bottlenecks if you have a lot of very short queries (because there's some shared state manipulations per query & transaction), but in a lot of cases postgres scales quite well even at those sizes.