| I've been waiting eagerly for this. Do you have a clear position on which PostgreSQL features not to support? I suppose there are more than just some things that won't make the cut because of the architectural decisions. While I unnderstand the decision, I'm not sure it's the best way to go about it. If you only emulate a subset of PostgreSQL's syntax and features, few people will be compelled to switch because they might be afraid. For greenfield projects, most people would probably choose the MySQL syntax since it's the default. I don't think this is about the necessity of running the PostgreSQL binary itself (although your approach already removes extensions which for many people is a downer). It's just that you can't trust an emulated system to be 100% equal in behavior (and people rely on implicit behavior of a system all the time, unfortunately) and that might be already enough for a lot of people to not use it. Have you guys already encountered some things in the PostgreSQL engine that just behave a bit differently from Dolt's engine? If so, what was your approach to mitigate it? Edit: just wanted to add: I'm really impressed by your work and I'm looking forward to trying this out. I don't mean to be mean, these are genuine questions I have. Congratulations on the launch. |
While I unnderstand the decision, I'm not sure it's the best way to go about it. If you only emulate a subset of PostgreSQL's syntax and features, few people will be compelled to switch because they might be afraid.
Eventually, we'd like to support the entirety of PostgreSQL's feature set, even including features like extensions. Dolt (https://github.com/dolthub/dolt), our first product, is the same to MySQL and DoltgreSQL is to Postgres, and we're taking a no-compromises approach to what we support. That, of course, means that there are a lot of features that need to be implemented, but Dolt is already almost there. For the majority of customers, Dolt has implemented everything they need from MySQL.
I'd definitely recommended checking out how Dolt compares with MySQL to see how we're approaching compatibility. All behavior, implicit and explicit, is something that we aim to model, and any deviations are considered bugs that we need to fix. There are exceptions, but those are only used when we feel it's for good reason (an example being how MySQL handles collation cascading in some circumstances).
> Have you guys already encountered some things in the PostgreSQL engine that just behave a bit differently from Dolt's engine? If so, what was your approach to mitigate it?
With DoltgreSQL, it's at an extremely early stage. We're still working on getting the basic functionality working before we rigorously start testing to make sure that we match PostgreSQL's behavior. However, we can point to our approach with Dolt and MySQL for how we plan to handle DoltgreSQL and PostgreSQL. For every feature we implement, we compare the functionality with what is written in MySQL's documentation as a baseline. From there, we move on to comparing the output across a range of input statements. Sometimes the documentation differs from MySQL's own results, and we then try to find out why that's the case (Configuration? Out of date documentation? Bug? etc.).
We also use external benchmarks to measure our correctness versus MySQL. In one such benchmark, containing around 6 million tests, Dolt recently reached 99.99% compared to MySQL (https://www.dolthub.com/blog/2023-10-11-four-9s-correctness/).
I hope this answered your questions! Let me know if you have any more :)