Hacker News new | ask | show | jobs
by grzm 2837 days ago
> "- Not distributable" This is something that's actively being worked on. It's also useful to look at the systems that are more easily distributable, and compare other features they posses with Postgres. Aphyr has provided a wonderful service with Jepsen showing just how difficult correctness in general (with and without the added complexity of distributability) actually is.

- "Poor resiliency" Do you mean something by this beyond as an aspect of distributability?

- "Hard to upgrade" The story here has been much improved, both with pg_upgrade and though features such as logical replication.

- "Schemas remains at postgres core" Would you elaborate what you mean by this?

I would also ask you to elaborate what you mean by "However it's also one of the most (if not the most) bloated." In terms of binary size? Feature set? Actual on-disk data representation? If so, what are you comparing it to?

Similarly, "It feels like things haven't changed that much since the 90's.": Having used Postgres over the years, I can certainly say that things have become much easier. What in particular are you thinking of?

I also struggle to square what appears to be complaints of capability while also expressing frustration with "postgres bloat and overall complexity of the system." As with all things, there are tradeoffs and resource limitations. What would you in particular are you looking for? And given the nature of Postgres as an open source project, what are you doing to ensure that those features you desire are being worked on? There are plenty of developers and organizations that are willing to work on bespoke features given the appropriate support.

Jepsen: https://jepsen.io

1 comments

Thanks for replying. Honestly, I expected a more closed and rude answer because I know my opinion on this is quite unpopular. The context of my answer: I love PostgreSQL, I've been using it in production, day-to-day, at work and for personal projects. Still it happens that I have also used more modern databases and they fixed some flaws postgres has (while having alot of other flaws that's not the point). I was answering to the question "what would make anybody choose another database given that Postgres can do pretty much anything?".

> This is something that's actively being worked on. It's also useful to look at the systems that are more easily distributable, and compare other features they posses with Postgres. Aphyr has provided a wonderful service with Jepsen showing just how difficult correctness in general (with and without the added complexity of distributability) actually is.

Thanks. I'll definitely have a look into this. I'm definitely not saying that everything would be easy and ideal to implement in such a robust and mature software (I am even sure it would be a nightmare for the developers to develop and stabilize and that for this reason it's unlikely to ever happen). Other databases provides this out of the box, postgres just don't, and that's a good reason "why would anyone use anything else". Still, not a requirement for everybody, but more frequent than JIT IMO.

> Do you mean something by this beyond as an aspect of distributability?

I mean that physical replication is great, but you always get into troubles when it comes to failover. Pretty much any modern DB system can be put in front of a basic HTTP LB, and replicate data on n nodes. It's indeed tightly coupled with distribution, but I meant distribution in term of performance and sharding (horizontal scalability) and resiliency in term of reliability. Relying on DNS for failover induces a downtime, relying on a proxy induces a hack. I'm wondering why it would be so hard to fix this in postgres though.

> The story here has been much improved, both with pg_upgrade and though features such as logical replication.

This is definitely true. While I think logical replication could/should go even further to really fix the upgrade issue, PG 10 is a huge move in the right direction. Thanks for pointing that out.

> "Schemas remains at postgres core" Would you elaborate what you mean by this?

Maybe it's more a general feeling than something really concrete, but for instance, would you create a table with only one JSONB column. It would feel weird, like abusing/hacking around the schema. From the user perspective, it's impractical, not natural, and it doesn't look like a document store at all (as we see in nosql DBs). In this respect, I don't think I will ever use postgres as a document-only database because I feel like it's just not how postgres is built (unless I'm missing some great feature?). That's a subject were document-only database may have the point.

> In terms of binary size? Feature set? Actual on-disk data representation? If so, what are you comparing it to?

I'm only speaking in term of feature set. A lot of documentation comes with them. A lot of side-effects. A lot of wrong expectations from the user. A lot of complexity. A lot of maintenance burden which makes postgres more complex to evolve on huge topics.

Also, this is a common problem across the most used relational DBMS (MySQL, Oracle, SQL Server and Postgres). To give another perspective, ElasticSearch dropped a massive amount of features in version 6 just so that they can focus on the core things. I personally think it was a good move.

> Having used Postgres over the years, I can certainly say that things have become much easier. What in particular are you thinking of?

I have also used PG for years and while I reckon you are completely right (a lot of improvements have been made in a tremendous amount of places), I was pointing out that the core topic I listed were still such a pain to handle, while so important since day 1.

> I also struggle to square what appears to be complaints of capability while also expressing frustration with "postgres bloat and overall complexity of the system." As with all things, there are tradeoffs and resource limitations. What would you in particular are you looking for?

Ok, let me fix this: I'm not complaining at all, really really not. PG guys are doing a fantastic job, PG is a great software, no doubt about it. I would even go further: in my opinion postgres is the currently the best mature RDBMS in the place. I am just saying that postgres is not and probably never will be the answer to all the DB use-cases in the world. More than that: postgres is even failing on very basic stuff needed in almost every modern app (mostly because it was started at a time were HA was not really a concern). Put another way: some modern DBs are solving some modern problems better than postgres and might disqualify postgres for some of these use-cases. I did not meant anything else, believe me, and I apologize sincerely to all the developers that have worked on the project if my comment was taken like that.

You make a lot of mention of "modern databases" without naming any. I specifically referenced the Jepsen project as it's repeatedly shown that many/most of the "modern databases"† (and those with more history) actually don't provide the features you describe without issue. Postgres is upfront about the current limitations rather than market itself as something it isn't. I'd ask to you be specific when calling out projects you prefer for your use cases so they may be actually compared. Handwaving serves no one well.

No system is the absolute best for every use case. No one is saying it is. But please be fair and lay all the cards on the table so we can all see and judge for ourselves.

† As an aside, this use of "modern database" sticks in my craw a bit. It's very imprecise. It seems you might be using it to mean anything non-relational or perhaps non-SQL. The so-called NoSQL database systems are often quite different even amongst themselves: lumping them together does a disservice to the individual implementations. Then again, perhaps I'm misinterpreting what you mean.

I did not mention any database behind the very imprecise term "modern database" because it was not the point. Those databases at least try to solve those real-world issues in good or bad ways. Some people/companies have the empirical proof that they work for their use-case in production, just like I do, and I don't think the conversation is specifically about how good the DB solves the issue when Postgres does not even try to solve it. And again, I'm not saying this is bad (well, some of these points have become so essential nowadays in most of real-world applications that I think it's a very practical issue), but these are elements to take into account when opting for one DB or another. Sometimes, it may disqualify postgres, some other times it may not. In this respect, the original assertion (the one that made me react in the first place) “if PG gets better text search […] I don't see much reason to use anything else either.” just sounds very wrong to me. It depends on so much other (more important) things. At work, I have colleagues that are huge postgres fans to the point where they loose any critical sense on it. I think this situation is never ever good for engineers when it comes to taking the right decisions seriously.

I also know that distributed systems don't come without their own issues and complexity (CAP mostly, but also distributed systems = more complex = more bugs, and also younger = less mature = more bugs, and configuration issues, sharding issues). Some databases are very clear about those bugs and limitations (https://www.elastic.co/guide/en/elasticsearch/resiliency/cur...) some aren't (I don't think MongoDB documents them). Behind "modern databases", I am thinking of ElasticSearch, CouchDB (they solved a lot of issues regarding scalability recently by merging BigCouch in v2), MongoDB (arguably one of the worst way to address all the mentioned problems, but whatever), DynamoDB, for the databases I've been using or I'm currently using in production. I've also played around with AWS Aurora (yes, AWS is forking MySQL and Postgres to solve those issues at root which is a good proof that there is actually a demand), and also more specific databases like InfluxDB or key-value stores like Consul's. They all have their own solutions and tradeoffs. But I'm not sure mentioning them is very relevant for the argumentation.

Honestly, the issue I have is I don't care if modern databases try to solve those problems, I only care if they _actually_ solve those problems, instead of them lying that they provide CAP and are distributed/infinitely scalable, when in fact, it isn't.
> for instance, would you create a table with only one JSONB column.

Sure. To make it more practical, I'd use a primary-key column too (like most document-only databases do).

We actually do use a (INTEGER PK, JSONB) table in Pg, for a variable-depth permission-model store. The permissions (per PK) are encoded into a JSON document and queried using a 'get json object at path' lookup. Turns out, the way json & jsonb operators and functions are designed in Pg, it feels exactly like a document store, just available in your SQL (which is a plus, in my book).

I think the larger theme here is your feeling that this is abusing or hacking around "the schema", but really, what is "the schema"? I've seen people use MySQL as just a store, using plain, NULLABLE VARCHAR and INT columns, and enforce a "schema" at the application layer, serializing and de-serializing their data from the VARCHAR columns. I've seen projects reinvent MVCC using an INT 'version' column in their tables.

Databases can only allow defining and enforcing a schema at the database layer, they can't force it. Some databases allow a lot of control and many ways to define a schema, some allow little, but there's no requirement that one MUST use all the ways when available. In fact, even with rich databases like Postgres, sometimes even the db-level enforceable schema is not sufficient and one has to make do with enforcing some of it at application-level, and I'm talking about just the traditional data types here.

Using JSON columns and operators and functions is — at the end of the day — using just another data type. Given Postgres is an object-relational database (it IS, after all, post-INGRES), it's only natural to use Postgres with data types like JSON (and arrays, and ranges, and GIS).

> I don't think I will ever use postgres as a document-only database because I feel like it's just not how postgres is built (unless I'm missing some great feature?)

Partial jsonb indexing (\w full-text), transactional fast joins on jsonb fields, and higher Jepsen rating than most noSQL DBs is pretty killer for me.

What are these amazing modern DB's you're speaking of that are kicking the pants off of Postgres that I seem to be unaware of?
I use PostgreSQL as a document db using Marten.