| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by leononame 966 days ago

I've been waiting eagerly for this.

Do you have a clear position on which PostgreSQL features not to support? I suppose there are more than just some things that won't make the cut because of the architectural decisions.

While I unnderstand the decision, I'm not sure it's the best way to go about it. If you only emulate a subset of PostgreSQL's syntax and features, few people will be compelled to switch because they might be afraid. For greenfield projects, most people would probably choose the MySQL syntax since it's the default.

I don't think this is about the necessity of running the PostgreSQL binary itself (although your approach already removes extensions which for many people is a downer). It's just that you can't trust an emulated system to be 100% equal in behavior (and people rely on implicit behavior of a system all the time, unfortunately) and that might be already enough for a lot of people to not use it.

Have you guys already encountered some things in the PostgreSQL engine that just behave a bit differently from Dolt's engine? If so, what was your approach to mitigate it?

Edit: just wanted to add: I'm really impressed by your work and I'm looking forward to trying this out. I don't mean to be mean, these are genuine questions I have. Congratulations on the launch.

1 comments

Hydrocharged 965 days ago

> Do you have a clear position on which PostgreSQL features not to support? I suppose there are more than just some things that won't make the cut because of the architectural decisions.

Eventually, we'd like to support the entirety of PostgreSQL's feature set, even including features like extensions. Dolt (https://github.com/dolthub/dolt), our first product, is the same to MySQL and DoltgreSQL is to Postgres, and we're taking a no-compromises approach to what we support. That, of course, means that there are a lot of features that need to be implemented, but Dolt is already almost there. For the majority of customers, Dolt has implemented everything they need from MySQL.

I'd definitely recommended checking out how Dolt compares with MySQL to see how we're approaching compatibility. All behavior, implicit and explicit, is something that we aim to model, and any deviations are considered bugs that we need to fix. There are exceptions, but those are only used when we feel it's for good reason (an example being how MySQL handles collation cascading in some circumstances).

> Have you guys already encountered some things in the PostgreSQL engine that just behave a bit differently from Dolt's engine? If so, what was your approach to mitigate it?

With DoltgreSQL, it's at an extremely early stage. We're still working on getting the basic functionality working before we rigorously start testing to make sure that we match PostgreSQL's behavior. However, we can point to our approach with Dolt and MySQL for how we plan to handle DoltgreSQL and PostgreSQL. For every feature we implement, we compare the functionality with what is written in MySQL's documentation as a baseline. From there, we move on to comparing the output across a range of input statements. Sometimes the documentation differs from MySQL's own results, and we then try to find out why that's the case (Configuration? Out of date documentation? Bug? etc.).

We also use external benchmarks to measure our correctness versus MySQL. In one such benchmark, containing around 6 million tests, Dolt recently reached 99.99% compared to MySQL (https://www.dolthub.com/blog/2023-10-11-four-9s-correctness/).

I hope this answered your questions! Let me know if you have any more :)

link

leononame 965 days ago

That does answer my questions. It's an extremely ambitious undertaking and I wish you the best. I'll be following this closely.

Some people do performance optimisations based on PostgreSQL's inner working (e.g. trying to force data that isn't read often but not small into toast). How far do your ambitions go? Are you planning on modeling internal behavior like this as well? Do you think putting in an abstraction layer like this between Dolt will hurt performance?

Do you have a time frame (probably not) or a roadmap for Doltgres?

link

Hydrocharged 965 days ago

Thank you!

Performance optimizations are tackled a bit differently than correctness ones. For the most part, we'll try to use metrics to find weak points in the execution graph and optimize those, but we won't go so far as to try and model the internal performance behavior. In part because our storage format is so different that we'll have different performance characteristics by necessity.

We don't have a public roadmap for Doltgres just yet, but we're hoping to put one out quite soon! We have a lot of low-hanging roadblocks that we want to take care of before we can get a better look at the overall time frame. We should definitely have one up by the end of the month, but I don't want to commit to a time before that.

link