Hacker News new | ask | show | jobs
by emmanueloga_ 555 days ago
Is anyone here using YugabyteDB for high-availability Postgres?

It seems like a compelling option:

* Much closer to Postgres compatibility than CockroachDB.

* A more permissive license.

* Built-in connection manager [1], which should simplify deployment.

* Supports both high availability and geo-distribution, which is useful if scaling globally becomes necessary later.

That said, I don't see it mentioned around here often. I wonder if anyone here has tried it and can comment on it.

--

1: https://docs.yugabyte.com/preview/explore/going-beyond-sql/c...

5 comments

I was under the impression that Yugabyte requires signing a CLA to contribute which leads me to avoid it for fear of them relicensing the thing when the VC's start squeezing. Also: very unique and single vendor driven. Seems like too much of a risk longer term but that is just my take.

EDIT: in response to your question I did run a PoC of it but it had issues where I wasn't able to create very large indexes without the statement timing out on me. Basic simple hand-benchmarking of complex joins on very large tables were very slow if they finished at all. I suppose systems like this and cockroach really need short, simple statements and high client-concurrency rather than large, complex queries.

> DDL timeouts

That’s normal for building indices on large tables, regardless of the RDBMS. Increase the timeout, and build them with the CONCURRENTLY option.

> Query speed

Without knowing your schema and query I can’t say with any certainty, but it shouldn’t be dramatically slower than single-node Postgres, assuming your table statistics are accurate (have you run ANALYZE <table>?), necessary indices are in place, and there aren’t some horrendously wrong parameters set.

Not sure about the CLA process, but the database is already under a restrictive, proprietary license:

    ## Free Trial
    
    Use to evaluate whether the software suits a particular
    application for less than 32 consecutive calendar days, on
    behalf of you or your company, is use for a permitted purpose.

https://github.com/yugabyte/yugabyte-db/blob/master/licenses...

It's not really clear what this means (what is a permitted purpose?), but it seems the intent is that after 32 days, you are expected to pay up. Or at least prepare for a future when the infrastructure to charge customers is in place (if it isn't there yet).

Thanks. I think that only covers the commercial bits they run themselves though:

  "The entire database with all its features (including the enterprise ones) is licensed under the Apache License 2.0


  The binaries that contain -managed in the artifact and help run a managed service are licensed under the Polyform Free Trial License 1.0.0."
EDIT: formatting
It also mentions: > By default, the build options generate only the Apache License 2.0 binaries.

So, it seems like the proprietary builds are for the managed services that they host themselves, which makes sense.

Index creation should not be controlled by statement timeout, but backfill_index_client_rpc_timeout_ms which defaults to 24 hours. May have been lower in old versions
It seems cockroach got all the love here indeed. We use Yugabyte and we are happy with it; for our usecases it is a lot faster and easier to work with than cockroach.
I'm curious about this as well. I often see people talk about CockroachDB in production, but I don't think I've ever heard of anyone running Yugabyte. But it is definitely under active development.

I found two threads discussing it from the past year:

https://news.ycombinator.com/item?id=39430411

https://news.ycombinator.com/item?id=38914764

Yugabyte (as with CockroachDB and TiDB) is based on mapping relations to an LSM-tree-based KV store, where ranges of keys get mapped to different nodes managed through a Raft group. That kind of structure has very different performance characteristics compared to Postgres' page-based MVCC. In particular, LSM trees are not a free lunch.

Query execution is also very different when a table's data is spread over multiple nodes. For example, joins are done on the query executor side by executing remote scans against each participating storage node and then merging the results. That's always going to be slower than a system that already has all the data locally.

YB also lacks some index optimizations. There is some work to make bitmap index scans work in YB, which will give a huge performance boost to many queries, but it's incomplete. YB does have some optimizations (like loose index scans) that Postgres does not have. So it's probably fair to say that YB is probably a lot slower than PG for some things and a little faster at others.

I think it's fundamentally not a bad architecture, just different from Postgres. So even though they took the higher layers from Postgres, there's a whole bunch of rearchitecting needed in order to make the higher layers work with the lower ones. You do get some Postgres stuff for free, but I wonder if the amount of work here is worth it in the end. So much in Postgres makes the assumption of a local page heap.

What we see in cases where someone takes Postgres and replaces the guts (Greenplum, Cloudberry, and of course YDB) is that it becomes a huge effort to keep up with new Postgres versions. YDB is on Postgres 12, which came out in 2019, and is slowly upgrading to 15, which came out 2022. By the time they've upgraded to 15, it will probably be 2-3 versions behind, and the work continues.

Worth noting: Yugabyte was tested by Kyle Kingsbury back in 2019, which uncovered some deficiencies. Not sure what the state is today. The YB team also runs their own Jepsen tests now as part of CI, which is a good sign.

Regarding the last point << Yugabyte was tested by Kyle Kingsbury back in 2019, which uncovered some deficiencies. Not sure what the state is today. The YB team also runs their own Jepsen tests now as part of CI, which is a good sign. >>

Please see this blog https://www.yugabyte.com/blog/chaos-testing-yugabytedb/ for latest updates, as well as information on additional in-house built frameworks for resiliency and consistency testing.

> What we see in cases where someone takes Postgres and replaces the guts (Greenplum, Cloudberry, and of course YDB) is that it becomes a huge effort to keep up with new Postgres versions.

The first upgrade is the hardest, but after that we will have the framework in place to perform consecutive upgrades much sooner. When the pg11 to pg15 upgrade becomes available it will be in-place online without affecting the DMLs, no other pg fork offers this capability today.

I was referring to the effort by the developers to keep the forked codebase itself up to date with mainline. Isn't that the main hurdle?

My understanding is that you are patching a lot of core Postgres code rather than providing the functionality through any kind of plugin interface, so every time there is a major Postgres release, "rebasing" on top of it is a large effort.

That, to my knowledge, is why Greenplum fell behind so much. It took them four years to get from 9.6 to 12, and I believe that's where they are today.

Cutis is an extension. That's the best you can get by being outside the core db. If you want true distributed architecture then you need to change the QO, DDL, transaction, even query stat components. At which point it ends up being a fork.

Yes, the merges are hard. But pg12 changed lots of fundamental things making it very challenging. Pg15 to pg17 should be much simpler.

Yugabyte is Postgres compatible, not actually Postgres.

It's also only compatible insofar that you can use only a subset of Postgres features, as they're only supporting the most basics things like select, views etc

Triggers, notifys etc were out of scope the last time I checked (which has admittedly been a while)

You're right of course, it's not entirely compatible, but this might be interesting:

> We use vanilla Postgres as-is for the query layer and replace Postgres storage with YugabyteDB’s own distributed storage engine.

https://www.yugabyte.com/blog/yugabytedb-enhanced-postgres-c...

I only skimmed that blog post after a while because it reads like a pitch from an MBA targeted at venture capitalists.

I feel like thats not actually a distinction that matters to application developers, because they know thats just a technical detail that only concerns the developers of the database. Ultimately, all the compatibility has to be implemented in the storage engine. The fact that they're using Postgres's porcelain is surely a time saver for them, but of no consequence to the consumers/users of the database

YugabyteDB supports much more than basic things. I've been a 3+ years dev advocate for Yugabyte, and I've always seen triggers. LISTEN/NOTIFY is not yet there (it is an anti-pattern for horizontal scalability, but we will add it as some frameworks use it). Not yet 100% compatible, but there's no Distributed SQL with more PG compatibility. Many (Spanner, CRDB, DSQL) are only wire protocol + dialect. YugabyteDB runs Postgres code and provides the same behavior (locks, isolation levels, datatype arithmetic...)
One thing possibly holding some folks back is the version of Postgres it's held back to. Right now YDB has PostgreSQL 12 comparability. Support for PG15 is under active development, so hopefully it's a 2025 feature. I really wanted to be able to actually use YugabyteDB for once, but our developers reportedly are using PG15+ features.

https://github.com/yugabyte/yugabyte-db/issues/9797

YDB is another database, they unfortunately didn't protect that trademark.

But they do call it yugabyteDB, YugabyteDB, YugaByte DB, yugabyte-db, and Yugabyte.

Right now it's based on PostgreSQL 11.2 with some patches pulled from newer releases. The upgrade will be to PG 15, and includes work to make further PG upgrades easier (think online upgrading a cluster on postgresql major versions).