Hacker News new | ask | show | jobs
by endymi0n 814 days ago
Far. As in, really, really far. We started out with Postgres because it was just the simple and sensible option for a production prototype, and when somebody came around telling me we need something more scalable recently, I calculated that there's not even enough addressable market in the world for our business for more than 4x our size. That's exactly the two remaining vertical doublings of our DB instance (to ridiculously looking RAM & CPU numbers) we could still do in case we need it.

Other than that, thanks to a lot of the recent work on connection handling and concurrency since PG 11, Postgres is getting better and better actually using these additional resources well: https://www.enterprisedb.com/blog/performance-comparison-maj...

4 comments

Several years ago I was a contractor on a project for that would sip up data from all forms of sensor and moving objects around a large city. It was decided that we needed Kafka and a few similar tools to handle it.

It was not hard to calculate the current max traffic or estimate the traffic growth over the next 10 years.

I did a demo of the system running on my laptop (all of it) + Postgres handling 100x the current data without too much difficulty.

Still they went with the "scale" solution because it was the right design. (and of course the consultants and me got quite lot more work todo so made a good deal more money)

At this point you can get 24TB of RAM in an EC2 instance (along with 448 vCPUs, 100Gbps of network bandwidth and 38Gbps of EBS bandwidth). That won't scale forever, but Stack Overflow has been running on a single primary/standby setup with 1.5TB of RAM so that would be 16x Stack Overflow's RAM.

I think a lot of work goes into horizontal scaling which is necessary at a certain scale, but very few people actually get anywhere near that scale. It can be important to understand which things are needed at your scale and where you can simply buy some beefier hardware. I've been at places where people run a dozen sharded DB servers with each server having 16GB of RAM. Maybe that's resume-driven-development where someone wants to say they've done that.

A bunch of smaller distributed instances could be cheaper than one big one at equivalent size/compute. It also allows you to grow as needed, without worrying about things like DB transfer, instead of absorbing a big upfront cost.

I agree it adds alot of complexity to the problem, which is another cost.

I guess this would be another argument for pay-as-you go cloud-managed DBs, despite being more expensive than rolling your own.

Scaling horizontally undoubtedly introduces complexity but it also comes with some upsides:

    * DB backups are now (much) faster.
    * Smaller backups means faster restores which reduces your RTO (Recovery Time Objective)
    * If you have a well architectured application a catastrophic DB failure will now only impact a portion of your userbase instead of all of them.
There are probably more good reasons but these are the ones I could think of now.
Is high availability or easier backups why people look to horizontal scaling though? I don't think that's ever been a primary reason for any story I've read. It's a great "bonus", but I can't think that it would be a compelling reason to choose horizontal vs vertical scaling.

(There are other reasons, of course...)

I remember using PG 10 at a previous company that was kinda abusing Postgres as a data processing tool with temp tables. Even with the parallel scans etc, we found it was a lot faster to split our queries (mostly INSERT(SELECT...)) into separate ones operating on separate ranges of rows, one for each CPU core. We'd run EXPLAIN to print out the plan then shard on the innermost or outermost join. I even implemented a huge sparse matrix addition/multiplication calculator this way, chaining multiple operations into a single huge query, far exceeding the limits of numpy. I've always wondered if Postgres could be used as a more efficient Spark backend.

It usually scaled linearly. We had a 32-core (64-vcore) server, saturating all cores and running a bit more than 32x as fast as a single query. In some cases, it was less than linear but much better than singular, and I think that was only cause of mistakes like uuid4 pkeys.

how long does it take to unencrypted and back up your database at its current size?