Hacker News new | ask | show | jobs
by threeseed 2841 days ago
Shame comments like this get downvoted. Because it's 100% true.

PostgreSQL still is lacking with its horizontal scalability story and there is no reason to choose it over something like Solr/ElasticSearch right now largely because of this.

3 comments

You're the reason why I inherit project that uses only ES and then end up doing relational schema in ES because they didn't use relational database in general.

If you had to choose between PostgreSQL or ElasticSearch then you have no idea what you're doing.

I recently interviewed at a startup that explicitly uses both, with PG being something of an authoritative backend store that feeds into ES as the primary frontend store (since the frontend is search-heavy).

It might be possible to get it to work without PG, but I imagine it would require a huge engineering effort (akin to an entire product on its own, which, IIRC, is what CrateDB is).

(Although I believe I understand your point, I fear that your tone and ad-hominem language detracts from it).

https://github.com/zombodb/zombodb works for me by turning Elasticsearch into an index type (PG10 only)
If it is 100% true then can you explain what he is referring to with "Poor resiliency" and "Schemas remains at postgres core"?

Also PostgreSQL is not hard to update, the only issue with PostgreSQL and upgrades is that it is hard to upgrade with zero downtime (pg_upgrade is very fast but still requires a small bit of downtime).

I can agree with the complaint about distribution, but that is a common issue for most databases not built from the start around distribution and nothing I would call "fundamentals".

Here's a real world example - I cannot use pg_upgrade because I have PostGIS installed and you cannot use different versions of PostGIS when upgrading, but I cannot install the older version in pg10.

We had to go to a slony hack (and put auto increment columns on a lot of tables temporarily) because of that.

So he's totally right, the upgrade path is not easy.

Sorry if it was unclear. I think I explained preciely what I meant in my responses to grzm. I don't think there's any debate on the "poor resiliency" point for anybody that has already used a distributed database or even just used postgres in production and experienced a slave promotion. For the "Schemas remains at postgres core" thing, it's probably not as straightforward, but once again, I really think anybody who has ever used a document-driven database can understand what I mean through this. You really don't use postgres as you use a document-driven database because it's impractical. (More generally, this point could be completed with a list of all the features that are easy to handle in pretty much any non-19xx database and that are a nightmare in postgres (users, permissions, indices creations/deletion, …).)

threeseed has summed my thoughts up. Solar/ES are solving issues that postgres just don't address at all at the moment, and it's definitely not only about text search (which is also a point). Saying that postgres should be/can be used for any use-case apart from TS is an absolute nonsense and that's what made me react in the first place. Moreover, the feeling that the community don't largely agree just on the fact that postgres doesn't meet these four points is indeed a bit appalling. OK, saying that postgres should try to solve them is another debate, but let's just agree on the fact that if you absolutely need a very resilient OR distributed OR document-based OR easy-to-upgrade DBMS (although for this last point, I have to agree that things have changed recently, partly thanks to the Uber gate I think), postgresql is probably not a good choice. There's definitely a HUGE trade-off and we should agree on this. Postgres don't come even close to Solar/ES in term of horizontal scalability. I'd really like it could, but as of today it can't.

> a nightmare in postgres (users, permissions, indices creations/deletion, …).

Could you expand on what you mean here? By 'users, permissions', I assume you meant management of users and access control to records at the database layer. If so, I should tell you I use database-enforced roles to manage access to various areas of the schema I manage, and RLS to protect access to records. If not, I'd very much like to know what exactly you meant.

Could you expand a bit on what exactly you mean by "indices creations/deletion"?

> things have changed recently, partly thanks to the Uber gate I think

Logical replication in core was in the works long before Uber's post, and already available as a plugin for at least two releases at the time. I remember, because we ourselves used it (for syncing data between environments, not for upgrades) in 9.6 (the then latest release).

I have benchmarked a single Postgres node performing comparably to a six node Cassandra cluster on timeseries data (using TimeseriesDB). The more I’ve had to deal with the limitations of other databases, the more I’ve come to think that people should just start with Postgres and only pick something else if Postgres doesn’t work out
> I’ve come to think that people should just start with Postgres and only pick something else if Postgres doesn’t work out

Even that standard results in people giving up on PG too early. It's possible to get it to continue working out, but it can require effort beyond a naive implementation and often more hardware knowledge than is common any more.

The other options can make naive implementations much easier, at least initially, and on almost any hardware. Of course, that usually ends up costing far more, even at modest scale.

Sometimes the issue with a comment isn't about its truth or lack therefore, but with its tone or potential for inciting flamewar (a.k.a. trollishness). With a brand new account, the intent is even harder to discern.

> PostgreSQL still is lacking with its horizontal scalability story

To the extent this is true, it may still be irrelevant for many environments. The ability to increase performance by just adding more servers (what I believe is usually meant by "horizonatal scalability") is not, despite its popularity, a desirable end in and of itself.

It's only desirable as a means of scalability, if scaling a "single" server (a.k.a. vertical) would reach the limit of available hardware, or if even approaching that limit becomes prohibitively expensive.

My impression from talking to managers (including startup founders) is that they often erroneously believe they fall into the latter category. Specifically, they believe that the overhead inherent in distributed systems is, essentially, negligible, so that they're avoiding the 2-4x price premium of a high-end single system. The reality is not so negligible, especially considering inefficiencies unique to a particular distributed database, regardless of if it's inescapable due to its nature.

The former can, of course, be a real concern, especially for VC-funded startups, for whom a lack of fast enough growth is its own failure. However, even in that situation, there's reason for skepticism, since most of the original "web scale" horror stories tended to involve MySQL (which had, and maybe still has, reduced capabilities for performance improvements through hardware) and database servers that hadn't even maxed out their I/O capabilities (not trivial to do naively pre-SSD).

> there is no reason to choose it over something like Solr/ElasticSearch right now largely because of this.

Even the OC points recognizes [1] that these distributed systems aren't without their drawbacks. Those drawbacks are additional reasons to choose Postgres over them.

[1] https://news.ycombinator.com/item?id=17964732