Hacker News new | ask | show | jobs
by sebk 1404 days ago
Besides storage itself, the Postgres compute layer has a good amount of (transient) state that doesn't lend itself to either compute nodes or clients springing in and out of existence in a serverless environment. For instance, a fresh compute node with an unfilled cache can perform horribly, and Postgres client connections don't scale well with transient clients. Both of these problems, and others in the same category, were very noticeable for Aurora Serverless. My understanding is that AWS mitigates these two by an elaborate cache-filling service for new nodes, and a pgbouncer-style proxy pooling connections and hiding compute nodes being rescheduled from clients.

What's Neon's point of view about transient state in nodes? Is there a world where serverless client connections are stateless, or is the set up overhead not expected to be worth the cost?

2 comments

Right. Our design guideline is to get as much serverless behavior as possible while keeping full Postgres compatibility (in terms of features and expected performance). Single node Postgres can give you hundreds of thousands of small RW queries per second, so competing connections should be a few compare-and-swap instructions away from the shared state to provide this performance. So for the primary, it means it should be just a Postgres in the container or VM, and we have to deal with consequences (cache pre-warm, handle cross-node migrations, etc).

However, read-only nodes require less coordination, and we have way more freedom there, so read-only Postgres as a function seems to be a more feasible concept.

This is a very good question. We are working on it and will be publishing a blog post on autoscaling very soon. We are experimenting with VM Migration technology that would allow to transfer the state between compute nodes and failover traffic.

We have some encouraging early results, but haven't committed to a particular technology (like cloud hypervisor) yet.