Given the code quality and rigid testing, SQLite is probably the last project that should be rewritten. It'd be great to see all other C code rewritten first!
That was my take when LibSQL was announced. And it still is and would be my take if LibSQL remains C-coded. But a Rust-coded rewrite of SQLite3 or LibSQL is a different story.
The SQLite3 business model is that SQLite3 is open source but the best test suite for it is proprietary, and they don't accept contributions to any of either. This incentivizes anyone who needs support and/or new features in SQLite3 to join the SQLite Consortium. It's a great business model -- I love it. But there are many users who want more of a say than even being a consortium member would grant them, and they want to contribute. For those users only a fork would make sense. But a fork would never gain much traction given that test suite being proprietary, and the SQLite3 team being so awesome.
However, a memory-safe language re-implementation of SQLite3 is a very different story. The U.S. government wants everyone to abandon C/C++ -- how will they do this if they depend on SQLite3? Apart from that there's also just a general interest and need to use memory-safe languages.
That said, you're right that there are many other projects that call for a rewrite in Rust way before SQLite3. The thing is: if you have the need and the funding, why wouldn't you rewrite the things you need first? And if SQLite3 is the first thing you need rewritten, why not?
>This is going to sound pedantic, but SQLite is not Open Source. It's Public Domain.
Well, there are 2 different modes of communication:
(1) official language-lawyer pedantic communication: "open source" != "public domain"
(2) conversational casual chitchat : "open source" includes "public domain"
Yes, the SQLite home page does say "public domain". However, when people interview SQLite create, Richard Hipp, he himself calls it "open source". He also doesn't correct others when they also call it "open source". Excerpt of R Hipp:
So, I thought, well, why can't I have a database that just
reads directly off the disc? And I looked around and
there were none available. I thought, “oh, I'll just write
my own, how hard can that be?” Well, it turns out to be
harder than you might think at first, but I didn't know
that at the time. But we got it out there and I just put it
out as open source. And before long, I started getting
these phone calls from the big tech companies of the
day, like Motorola and AOL, and, “Hey, can you
support this?”, and “Sure!” And it's like, wow, you can
make money by supporting open source software?
it's wrong though. like, can't be more wrong than that. you can't do whatever you want with open source software, the license tells what you can and cannot do.
with public domain software you can do most things.
Open source means just that: that the source is open. The OSI and co. re-defining the term to suit their ideological preferences doesn’t really change that. SQLite is open source, even if it’s not Open Source.
I don't know where you got this idea but it's not true. The OSI is simply defending the definition as it has been generally understood since the start of its usage in the 1980s by Stallman and others.
The only group of people "re-defining" -- quite successfully I suppose, which you are an example of -- what open source software means are those that have a profit motive to use the term to gain traction during the initial phase where a proprietary model would not have benefited them.
I don't think I need to provide concrete examples of companies that begin with an open source licensing model, only to rug-pull their users as soon as they feel it might benefit them financially, these re-licensing discussions show up on HN quite often.
I don't understand why OSI didn't pick an actually trademarkable term and license its use to projects that meet its ideals of open-sourceness. OSI knows it has no right to redefine common language and police its usage, any more than a grammar pedant has the right to levy fines against those of us who split infinitives.
(To be fair to OSI, I've never seen any of their representatives do this. But the internet vigilante squad they've spawned feels quite empowered to let us know we've broken the rules.)
> conversational casual chitchat : "open source" includes "public domain"
No. What are you talking about? They are not related... other than for people virtually completely new to, well, open source.
You are also completely confused, here, too:
> Yes, the SQLite home page does say "public domain". However, when people interview SQLite create, Richard Hipp, he himself calls it "open source". He also doesn't correct others when they also call it "open source".
They are different things. A project can be both; a person can talk about these two aspects of one project.
This quickly gets into the details of definitions, but I think by most people's definitions of 'open source', something that is 'public domain' qualifies as such (see also 'source available' or 'copyleft/free software', one of which is not quite open source and the other is a more restrictive kind of open source. 'permissive' licenses like MIT and similar are closer to public domain but are different to varying degrees of technicality: one of the main problems with 'public domain' is that it's not universally accepted that there's any means to deliberately place a copyrightable work into it, so something like sqlite where the authors are not long dead is not actually public domain according to many jusrisdictions)
It's a difference only insofar that in many jurisdictions their claim that it's public domain has no legal value. If it was truly public domain (e.g. if the authors were long dead) it would be open source. But far from all places allow you to arbitrarily put things in the public domain.
I'm a bit puzzled why SQLite doesn't solve this trivial issue by claiming the code is CC0-licensed. CC0 is made just for that: a very wordy way to make it as close to public domain as possible in each jurisdiction.
On the other hand, hobbyists won't care. As long as you trust them in their intention to have it open source they won't sue you for infringement either. And if as a company you need more assurance than "it's public domain" they are so nice to sell you a fancy legally-satisfying piece of paper for an undisclosed price. It's a subtle but clever way to get income from users with too much money
They explicitly state, "Anyone is free to copy, modify, publish, use, compile, sell, or distribute the original SQLite code, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means."
> They explicitly state, "Anyone is free to copy, modify, publish, use, compile, sell, or distribute the original SQLite code, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means."
It's not clear this is a license grant rather than legal advice (which would be correct legal advice if the code were public domain, but it is not).
Is it though? The website does say "All of the code and documentation in SQLite has been dedicated to the public domain by the authors" but copyright law has no exception for "dedications" to the public domain. At best the authors are estopped from bringing suit but even that is unclear.
Companies can buy licences if they're uncomfortable with the Public Domain dedication:
[quote]
Licenses are available to satisfy the following needs:
* You want indemnity against claims of copyright infringement.
* You are using SQLite in a jurisdiction that does not recognize the public domain.
* You are using SQLite in a jurisdiction that does not recognize the right of authors to dedicate their work to the public domain.
* You want to hold a tangible legal document as evidence that you have the legal right to use and distribute SQLite.
* Your legal department tells you that you have to purchase a license.
They could have CC0 licensed the code or they could have said they would not enforce their copyright. They did neither. SQLite is closed source. The "dedication" (which has no legal effect, what does it even mean?) encourages widespread adoption and big players are spooked into paying for a license (or "warranty of title"). That's quite a strategy.
capitalization is not bearing meaning in these contexts.
open source means OSI compliant, broadly speaking, and licensed as such.
in contrast, public domain doesn't exist in some jurisdictions, which is why sqlite as a company had to create an option to provide an official license. which they found so annoying that they charged a sweet fee to send a signed printed letter...
They don't own the words "open source" no matter how much they might like to.
> “Open Source” describes a subset of free software that is made available under a copyright license approved by the Open Source Initiative as conforming with the Open Source Definition.
No it doesn't. It describes software whose source is "open" which is generally understood to mean that you can read, modify and reuse the code for free.
Public domain definitely fits that. The "public domain doesn't exist in some countries" arguments are spurious as far as I can tell.
It is absolutely true that a work can be in the public domain and not have source available (or even contributable). But that doesn't really matter to most people. The question for most people is not whether something is open source, but whether they can copy and make use of a work without being held liable for copyright infringement. SQLite happens to be both public domain and open-source to an extent (i.e., source available).
Conversely, open source doesn't necessarily mean "free to use without encumbrance." There are many open-source licenses that forbid certain uses (e.g. Business Source License). On the other hand, a work in the public domain is free to be used by all without restriction.
A better analysis of open source vs. public domain would be in the form of a square, where one dimension would be the right to use the work, and the other dimension would be the ability to obtain and contribute source code.
The Business Source License is not an open source license. Open source does mean "free to use without encumbrance" - see points 5 and 6 of the Open Source Definition at https://opensource.org/osd
No one's saying public domain isn't useful. You're replying to a comment that's specifically and solely combatting the idea that public domain means open source.
Any definition of open source that doesn't include the public domain is out of touch with how real people use the words "open source" and is therefore useless. You can make up any definition you want, but if you insist on calling elephants "bananas", I'm not going to take you seriously
The problem with your analogy is that open source has a definition. As does public domain. As do elephants and bananas.
In your analogy we're not the ones calling elephants bananas, you are. We want to keep calling one bananas and the other elephants. You are suggesting that since elephants are similar to bananas you can simply use either word.
Legally, Open Source and Public Domain are -very- different animals. Open Source comes eith a copyright, and a license (which has requirements), public domain does not.
Of course public domain and open source are both "shipped as source code". Then again so is a fair bit of proprietary software. That doesn't make it open source either.
How people use the term "fair use" is out of touch with the legal definition. That doesn't change the legal definition, it means people use the term incorrectly.
> public domain software may be free software but is not certain to be.
Open Source relies on copyright and contract law (which are somewhat standardized or at least understood due to their importance in commerce). Public domain relies on other laws that can vary significantly.
As far as I can see, these tests come with the same public domain dedication as the rest of the code.
You may be referring to the TH3 tests (https://sqlite.org/th3.html). The main goal (100% branch coverage, 100% MC/DC) would not be achievable for a Rust implementation (or at least an idiomatic Rust implementation …) because of the remaining dynamic run-time checks Rust requires for safety.
sqlite also has some runtime checks that are expected to be always true or always false, and solves that by using a custom macro that removes these branches during branch coverage test.
The same would be possible in Rust. Everything that could panic has a non-panicking alternative, and you could conditionally insert `unreachable_unchecked()` to error handling branches to remove them. That wouldn't be most idiomatic, but SQLite's solution is also a custom one.
> The SQLite3 business model is that SQLite3 is open source but the best test suite for it is proprietary
no.
the business model is services, and a red phone to companies who use sqlite in production. like nokia back in the days when we had these little flip phones, or desk phones had a "rolodesk" built in, or many other embedded uses of a little lovely dependable data store.
the services include porting to and "certification" on specifically requested hardware and OS combinations, with indeed proprietary test suites. now these are not owned by sqlite, but by third parties. which license them to sqlite (the company).
and it started with being paid by the likes of nokia or IBM to make sqlite production ready, add mc/dc coverage, implement fuzzing, etc etc etc.,
their license asks you to do good not evil. and they take that serious and try their best to do the same. their own stuff is to an extreme extend in the public domain.
It's not just old Nokias or desktop phones, nor just embedded sytsems. sqlite is almost everywhere. Adobe, Apple, Microsoft, Google, Mozilla and many other companies use it in very widely deployed software.
> > The SQLite3 business model is that SQLite3 is open source but the best test suite for it is proprietary
> no.
> the business model is services, and a red phone to companies who use sqlite in production. like nokia back in the days when we had these little flip phones, or desk phones had a "rolodesk" built in, or many other embedded uses of a little lovely dependable data store.
Members of the SQLite Consortium surely have this "red phone" you speak of. So in what way was my characterization of their business model wrong?
> The U.S. government wants everyone to abandon C/C++
That's the position of two federal agencies, namely, FBI and CISA. They don't describe how this change will reduce CVEs or why the languages they prefer still produce projects with CVEs.
I don't particularly hold the technical or social acumen of FBI or CISA in particularly high regard and I'm not sure why anyone would by default either. Mostly because they say things like "switch to python!" without once accounting for the fact that python is written in C.
It's an absurd point to invoke as a defense of this idea.
You keep and maintain your local fork that does what you need it to do. perhaps if you are charitable you share it with others. but you don't need to do this. and it just adds support burden.
Even without that, it’s helpful. It means there is less (no?) undefined behavior that you will need to emulate to maintain compatibility. You can just follow the spec.
If you can not run the test suite, then how do you know that you properly followed the spec? And did so securely? And in a performant manner? Even for edge cases? On obscure hardware, filesystems, and OSes? Even if the power cuts out? Or the cable to the hard drive (transactions)? Even if a stray cosmic ray flips a bit?
By the way, SQLite itself does not meet one of these criteria. Know which one? ))
I’m not sure what your point is? Yes, it would be better if they would run their tests against your fork. But they won’t. Still, it’s better for the fork writer that they exist.
> Given the code quality and rigid testing, SQLite is probably the last project that should be rewritten.
That was my take for many years but I have come around 180 degree on this. I think at this point it's very likely and most likely mandatory to eventually rewrite SQLite. In parts because of what is called out in the blog post: the tests are not public. More importantly, the entire project is not really open. And to be clear: that is okay. The folks that are building it, want to have it that way, and that's the contract we have as users.
But that does make certain things really tricky that are quite exciting. So yes, I do think that SQLite could need some competition. Even just for finding new ways to influence the original project.
This reminds me of VIM - and after quite some time I believe that all VIM users will agree that adding NeoVIM to the ecosystem improved VIM itself. VIM 8 addressed over half the issues that led to the NeoVIM fork in the first place - with the exception of the issue of user contributions, of course.
A company that works with SQLite and prefers to write Rust has the expertise needed to rewrite SQLite in Rust. That’s what they’re doing.
All the other C code could be rewritten, this doesn’t stop or slow down any such effort. But for sure it was never going to be possible for a database provider to start making a memory safe implementation of libpng or something.
Seems like a potentially interesting project to get rid of sqlite's compatibility baggage e.g. non-strict tables, opt-in foreign keys, the oddities around rowid tables, etc... as well as progress the dialect a bit (types and domains for instance).
But the article mentions that they intend to have full compatibility:
> Our goal is to build a reimplementation of SQLite from scratch, fully compatible at the language and file format level, with the same or higher reliability SQLite is known for, but with full memory safety and on a new, modern architecture.
If you "intend to get rid of some of the baggage" you won't be fully compatible.
libSQL already isn't fully compatible: as soon as you add a RANDOM ROWID table, you get "malformed database schema" when using the (e.g.) sqlite3 shell to open your file (also Litestream doesn't work, etc).
And that's fine, as there probably is no better way of doing what you needed to do. But it's also taking what SQLite offers and breaking the ecosystem, under the covers of "we're compatible" without ever calling out what compromises are being made.
You also never got round to documenting the internal Virtual WAL APIs you exposed. This is something where SQLite is lacking, where you could've made an impact without any compatibility issues, and pressure upstream to release something by doing it first/better. Alas, you did it for Turso's exclusive benefit.
Once you compile your Typescript to Javascript, Javascript runtimes can run it, Javascript code can call it, etc. Even source maps work.
Once you start using libSQL features, SQLite tools will simply stop working with your databases.
That means the sqlite3 shell stops working, backup solutions like Litestream and sqlite-rsync stop working, SQLite GUIs like SQLiteStudio stop working, forensic and data recovery tools start giving will have a harder time working, etc.
Maybe it's all worth it, but it's not full compatibility, and it should at least be documented.
i would guess "full memory safety" is going to be impossible, at least at compile time. I'd guess that if for no other reason than performance SQLite uses data oriented techniques that effectively reduces pointers to indices, which will no longer have ownership or lifetime tracking in the rust compiler.
As a counterpoint, doing a rewrite of an example of the best C codebases gives you a much more interesting comparison between the languages. Rewriting a crappy C codebase in a modern, memory safe language is virtually guaranteed to result in something better. If a carefully executed rewrite of SQLite in Rust doesn't produce improvements (or some difficult tradeoffs), that's very informative about the relative virtues of C and Rust.
Code quality is not the only thing to consider. Some people would love to see something like SQLite with 2 important changes: referential integrity that respects the DDL and strict tables that also respects the DDL.
An SQLite fork will have a hard time being compelling enough to draw users away from the main project. Being written in Rust is the most compelling reason that I could think of. SQLite has many annoying quirks (foreign key constraints disabled by default and non-strongly-typed columns are my two pain points) but a fork that addresses them would still not pull me away from the original project that I have so much trust in.
If I were to fork SQLite, drawing users away from the main project would be a non-goal. The goal would be to get strict tables and foreign key constraints enforced 100% of the time.
Yeah, I would assume that any project like this would strive to be a soft fork that just has a few minimal patches to address specific needs, not something that actually tries to compete with the original.
If that was the case, they wouldn't introduce cross incompatibilities in the changes they made (or would at least discuss compatibility in the docs), and they'd make any added features useful to others by properly documenting them.
Compatibility for libSQL is a one way street. I don't expect Limbo to be any different.
Agreed! Rewriting in Rust (or any other language) is not required for those features. A fork and modifying the existing C code could also result in those features (and I might do just that if it doesn't come around soon).
Here is the STRICT table type page: https://www.sqlite.org/stricttables.html
It is fairly straightforward: you just have to add STRICT to your table definition and you have it.
And the FOREIGN KEY support is here: https://www.sqlite.org/foreignkeys.html
The two requirements are that your build not have it disabled, and that you execute `PRAGMA foreign_keys = ON;` when you open the database (every time you open the database).
Then build with SQLITE_DEFAULT_FOREIGN_KEYS=1 to make it opt-out (and to opt-out you'd need to inject SQL).
As for STRICT: if you make your tables STRICT, there's no opt-out.
So why is this an issue? Do you want them to break the file format to say "from this version forward, all tables are STRICT"? What does that really buy you?
It's an embed database: anyone who can mess with your database and circumvert integrity can also open the file and corrupt it.
I agree on a level that SQLIte is a master class in testing and quality. However, considering how widely used it is (essentially every client application on the planet) and that it does get several memory safety CVEs every year there is some merit in a rewrite in a memory safe language.
While I agree with you on one level, that code rigidity and testing means that a port of SQLite is much more viable than most other C-based projects. And I'm intrigued by what this would enable, e.g. the WASM stuff the authors mention. It's not that it couldn't be done in C but it'll be easier for a wider range of contributors to do it in Rust.
The SQLite3 business model is that SQLite3 is open source but the best test suite for it is proprietary, and they don't accept contributions to any of either. This incentivizes anyone who needs support and/or new features in SQLite3 to join the SQLite Consortium. It's a great business model -- I love it. But there are many users who want more of a say than even being a consortium member would grant them, and they want to contribute. For those users only a fork would make sense. But a fork would never gain much traction given that test suite being proprietary, and the SQLite3 team being so awesome.
However, a memory-safe language re-implementation of SQLite3 is a very different story. The U.S. government wants everyone to abandon C/C++ -- how will they do this if they depend on SQLite3? Apart from that there's also just a general interest and need to use memory-safe languages.
That said, you're right that there are many other projects that call for a rewrite in Rust way before SQLite3. The thing is: if you have the need and the funding, why wouldn't you rewrite the things you need first? And if SQLite3 is the first thing you need rewritten, why not?