Hacker News new | ask | show | jobs
by egnehots 819 days ago
Postgres is far from perfect:

- The codebase is old and huge, accruing some heavy technical debt, making it a less than ideal foundation for iterating quickly on a new paradigm like AI and vector databases.

- Some ancient design decisions have aged poorly, such as its one connection per process model, which is not as efficient as distributing async tasks over thread pools. If not mitigated through an external connection pooler you can easily have real production issues.

- Certain common use cases suffer from poor performance; for example, write amplification is a known issue. Many junior developers mistakenly believe they can simply update a timestamp or increment a field on a main table with numerous columns.

So, yes, PG is one of the best compromises available on the database market today. It's robust, offers good enough performance, and is feature-rich. However, I don't believe it can become the ONE database for all purposes.

Using a dedicated tool best suited for a specific use case still has its place; SQLite and DuckDB, for instance, are very different solutions with interesting trade-offs.

4 comments

I believe that there are contributors currently working on a one thread per connection version of PostgreSQL. It's a huge amount of work so I wouldn't expect it to be released tomorrow.

Regarding wide updates, I believe that HOT updates already partially solve this problem.

> Some ancient design decisions have aged poorly, such as its one connection per process model

Oracle uses the same model by default on Linux.

Since 19 (or maybe earlier) it is configurable though, but the default is still one process per connection if I'm not mistaken.

It's actually quite impressive they were willing & able to make such a drastic change in such an old and conservative codebase.
Not sure how drastic it was to be honest. Under Windows, multi-threading was always the only option. So in theory, they already had code for that.
Can you tell me more about the write amplification issue?
Postgres handles updates as insert+delete, and its secondary indexes reference the physical location of the row, instead of the primary key. This means that whenever an update results in an insert to a different page, the index needs to be updated as well, even if the indexed column hasn't been modified.

At least it has an optimization that if the insert ends up in the same page, it won't need to update the index https://www.postgresql.org/docs/current/storage-hot.html

Replication has a similar amplification issue. Historically postgres has favored physical replication over per-row logical replication, that means that replication needs to transfer every modified page, including modified indexes, instead of just the new value of the modified row. (I think logical replication support has improved over the last couple of years).

There is the OrioleDB project, which attempts to improve on the design flaws in postgres's storage engine, but it's definitely not production ready yet.

It’s worse than that, unfortunately – since the entire row has to be updated due to MVCC, if a single index is updated, then _all_ indexes are updated. If you have a wide table with a lot of indexes, each UPDATE is N*index writes. This becomes even worse with multiple updates if indexed columns aren’t k-sortable (like UUIDv4), since now the DB will probably have to jump around multiple pages despite the rows being logically sequential.
> new paradigm like AI and vector databases.

they have several ways to write extensions: extensions and fdw, so you can build your cool AI stuff without digging into PgSQL sources much.