Hacker News new | ask | show | jobs
by srer 1493 days ago
Wanting SQLite in Go touches on something that I think is quite a waste in modern Go circles, but happens everywhere to varying degrees.

There's often (for instance, in Go projects wanting to avoid cgo) a desire for everything to be in the single source language - Go. In what resembles NIH syndrome, there will be clones of existing libraries, offering little over the original except being "Written in Go". From experience this often makes for more bugs, as the Go version is commonly much younger and lessor used than the existing non-Go library.

The Python world does it a lot less, perhaps the slowness of Python helps encourage using non-python libraries in Python modules. But that sure does making building and distributing Python projects "fun".

What I'm trying to say is that:

A world where every language community has it's own SQLite project because the communities shun code written in other languages just feels like a profound waste.

17 comments

It's not as simple as NIH. Using cgo means you have to build and link external dependencies in another language. Compared to Go, this makes it quite a bit more difficult to build and distribute than a single binary. But I think the biggest reason is cgo is slow. Unlike rust which uses the C calling convention (correct me if I'm wrong) and has no garbage collector or coroutine stacks to worry about, Go has a hefty price to pay when calling a C funcion. For something like sqlite that causes a noticable slowdown. Typically what I like to do in this case is write a C wrapper around the slow API that let's me batch work or combine multiple calls into one. With sqlite, instead of fetching rows one at a time, I would write a wrapper that takes some arrays and or buffers to fetch multiple rows at a time. That amortizes the overhead and makes it less significant. When that's not possible, I use unsafe and or assembly to lift just the problematic C calls into Go. That can sometimes work wonders, but it's also not a magic bullet.
> correct me if I'm wrong

Rust can use the C calling convention to call C functions or export functions to C code, but this requires extra annotations. By default, Rust uses its own unstable ABI.

Interesting! I've never considered batching my calls to SQLite from Go that way. Do you have any numbers you can share about performance when doing that?
I don't, but I'm general it matters most for the cheapest C calls, so the functions doing the least work. Batching those somehow can give big speedups, over 2x, depending on how much of the total time was going into cgo overhead.
To be honest the reason why Go developers want "pure Go" libraries is simply because they can be statically + cross compiled and used without having to carry around an additional library, especially in environments where you hardly have anything other than the binary itself (e.g: Docker containers "FROM scratch" or Gokrazy)
In this case sqlite is bundled as a single C source file. You could just use Zig as your C cross compiler to cross compile alongside Go for almost any platform[1].

[1]: https://dev.to/kristoff/zig-makes-go-cross-compilation-just-...

Except that’s at the very least broken for macOS -> Linux right now. I attempted to use zig cc as the cross compiler for sqlite3 the other day. Ran into https://github.com/ziglang/zig/issues/5882 https://github.com/ziglang/zig/issues/9485 an almost two-year-old issue.

Cross compiling C sucks.

It is a bug, and we will fix it, but keep in mind the scope - this is something that affects old versions of glibc. Newer versions of glibc are not affected, and neither is musl libc (often preferable for cross compiling to Linux).

You can target a newer glibc like this: -target x86_64-linux-gnu.2.28

You can target musl libc like this: -target x86_64-linux-musl

Still way more of a pain than if it was pure Go.
I just can't agree with this. It's true that one piece, the compiling, is less painful. But the entire rest of the system from developing to testing to routing out implementation bugs is way, way more painful
It's ridiculously more painful for a single one off by a single dev for a single tool but an actual ecosystem of pure go reimplementations has popped up where that load can become collectively shared over time and ultimately using native implementations doesn't carry that burden to the end developer. It has to start somewhere though.
I've seen the same thing in the Java world, I presume for the same reasons as in Go: calling native code in Java is painful, and traditionally Java always had an emphasis on "write once, run everywhere". So the Java community tends to reimplement C code in Java, even when limitations of the language make the code slower and/or more complex (see the classic post "Re: Why Git is so fast" aka "Why is C faster than Java" at https://marc.info/?l=git&m=124111702609723&w=2 or https://public-inbox.org/git/20090430184319.GP23604@spearce.... which mentions things like the lack of unsigned types or value types).
Lack of value types yes, unsigned not really, specially since Java 8 there are utility methods for unsigned arithmetic.
It created a lot of churn, but I think it has been net positive for Go because they now have a huge ecosystem of stuff that can be installed without the black hole of C/C++ dependency installation.
Dependencies are sorted out when using Conan or vcpkg.

However the security aspect is much more relevant.

Coming to Go from Python, I thank god for every time I don't have to worry about a C compiler when installing something.

You lose probably a week of work every year to stuff like this in Python

Each call out to a C library consumes 1 OS thread (max 10s of Ks of threads before terrible performance/scheduling issues); each call out to a Go library consumes a go routine, of which you can have 100s of Ks without much problem.

For SQLite it seems it would be ok, as there's no network traffic, but I've had issues where network glitches (Kafka publisher C library) would cause unrecoverable CPU spikes and an increase in OS threads that never recovered.

So that's the functional reason behind the Go communities desire to write everything in Go. Plus a lot of the people who love Go also tend to be the sort who would enjoy re-writing C libraries into a nice new language.

Rewriting things in Go not only makes using them nicer, it often is a way to get a more correct and stable program. However, we are talking about sqlite here, one of the best tested and stable C programs out there. Rewriting it in Go rather raises the chance of a bug and that would be counter-productive.

It still can be an interesting project and if it proves to be correct and eventually shows some advantages vs. using the C version, it might become a nice alternative.

It's mostly autogenerated code and all the tests pass.
I guess the reason is that it is easier to cross compile by keeping everything in go? I have no knowledge of Sqlite cross compilation (on Linux targeting windows for instance) but I guess it's also possible but it makes build process a little more complex.

Node.js can take advantage of WASM which is pretty handy in some cases.

Getting cgo to cross-compile while targeting less popular architectures can be a royal pain: I was trying to use cgo to add official SQLite to a Go app that I had running on a long-abandoned (by OEM) mips/linux 2.x kernel "IoT" device with an equally ancient libc. It was a sisyphean task that absolutely nerd-sniped me, I spent way too much time on the building toolchains and trying to get them to work with cgo. I ended up going with a Go-version of SQLite
Yes this is the reason. Using cgo to link in c libraries for example is slower and also brings up other difficulties if you want to cross compile. Here's an old link (may be out of date) outlining some of them:

https://dave.cheney.net/2016/01/18/cgo-is-not-go

The main reason why I personally try to avoid adding non-go languages to my go projects is because it tends to make profiling / debugging a bit of a pain. pprof has limited vision into any external C threads so all you can see is the function call and the time goroutines spent off-cpu while waiting for it to finish. You can obviously supplement some of that with other tools (perf), but sometimes accepting the tradeoffs and using a Go implementation of the package instead just makes more sense.
> a desire for everything to be in the single source language

And that is good.

In special, this is driven by how TERRIBLE all the dance around C is. Sqlite is among the easiest, yet, it also cause trouble: suddenly, you need to bring certain LLVM, Visual Studio Tools, etc. And then you HOPE all the other tools use the correct env_vars, settings, etc.

And then, you hit a snag, and waste time dancing around C.

Yes 100%, here is my lament from 4 years ago on that topic.

https://news.ycombinator.com/item?id=16741043

A big part of my pain, and the pain I've observed in 15 years of industry, is programming language silos. Too much time is spent on "How do I do X in language Y?" rather than just "How do I do X?"

For example, people want a web socket server, or a syntax highlighting library, in pure Python, or Go, or JavaScript, etc. It's repetitive and drastically increases the amount of code that has to be maintained, and reduces the overall quality of each solution (e.g. think e-mail parsers, video codecs, spam filters, information retrieval libraries, etc.).

There's this tendency of languages to want to be the be-all end-all, i.e. to pretend that they are at the center of the universe. Instead, they should focus on interoperating with other languages (as in the Unix philosophy).

One reason I left Google over 6 years ago was the constant code churn without user visible progress. Somebody wrote a Google+ rant about how Python services should be rewritten in Go so that IDEs would work better. I posted something like <troll> ... Meanwhile other companies are shipping features that users care about </troll>. Google+ itself is probably another example of that inward looking, out of touch view. (which was of course not universal at Google, but definitely there)

This is one reason I'm working on https://www.oilshell.org -- with a focus on INTEROPERABILITY and stable "narrow waists" (as discussed on the blog https://www.oilshell.org/blog/2022/02/diagrams.html )

I think you need to look deeper - one of the strengths of go is the runtime and everything they do there to support their internal threading model. When you are calling out to an external language you have memory being allocated and managed outside the go runtime, and you have opaque blocks of code that aren’t going to let the go runtime do anything else on the same cpu until they exit. Those are more the considerations to wanting go native implementations. Even with SQLite, which is probably the most solid and throughly debugged pieces of code written since the Apollo program, it would be desirable to minimize the amount of data being copied across the runtime interface, and to allow other go routines to run while I/o operations are in progress.
Well, sqllite is almost as valuable as a FILE FORMAT as it would be as an application, if the "store config as sqllite" crowd would contend.

Thus language-native versions of sqllite really can be viewed as language-specific file format readers/writers like any other JSON/YAML library.

It's especially helpful to be pure Go when targeting both iOS and Android (in addition to Linux, Mac, and Windows) with https://github.com/fyne-io/fyne#about
I was a beginner in Go when I wanted to use SQLite with it and I wanted an easy way without a lot of hassle to build, it looked like cgo was the only solution back then. I really wished there was something that I can easily use from Go rather than building with cgo.
I don't think the Go community is particularly susceptible to this. You mention Python; as you say, Python and the dynamic scripting languages are particularly "OK" with having things that back to C, because of the huge performance improvements you get with doing as much as possible in C in those languages. Dynamic scripting languages are slow. But these are the exceptions, not the rule.

Most other languages want native libraries not because of some bizarre fear of C, but because of the semantics. Native libraries work with all the features of the language, whatever they may be. A naive native binding to SQLite in Rust may be functional, but it will not, for instance, support Rust's iterators. That's kind of a bummer. Any real Rust library for something as big as SQLite will of course provide them, but as you go further down the list of "popular libraries" the bindings will get more and more foreign.

Also, the design of these dynamic scripting languages were non-trivially bent around treating the ability to bind as C as a first-class concern. I think if they were never designed around that, there are many things that would not look the same. One big one is that Python would be multithreaded cleanly today if it didn't consider that a big deal because the primary problem with the GIL isn't Python itself, but the C bindings you'd leave behind if you remove it. Go's issue is mostly that it came far enough into C's extremely slow, but steady, decline that it was able to make it a second-class concern instead of a first, and not force the entire language's design and runtime to bend around making C happy.

As it happens, in my other window, I'm writing Go code using GraphViz bindings, and I'm experiencing exactly this problem. It works, yes. But it's very non-idiomatic. I've had to penetrate the abstraction a couple of time to pass parameters down to GraphViz that the wrapper didn't directly support. (Fortunately, it also provided the capability to do so, but that doesn't always happen.) There's a function I have to call to indicate that a particular section is using the HTML-like label support GraphViz has, which in Go, takes a string and appears to return the exact same string, but the second string is magical and if used as the label will be HTML-like.

This is not special to Go, I've encountered this problem in Python (the Tkinter bindings are a ton of "fun"; the foreign language in this case is Tcl, and if you want to get fancy you'll end up learning some Tcl too!), Perl, several other places. A native library would be much nicer.

Finally, the Go SQLite project isn't it's own SQLite. It's actually a C-to-Go compile, as I understand it. That's not really a separate project.

There are good reasons to avoid C dependencies in go though. Because it's essentially a big black box to the runtime and so you loose some benefit when you go there (no pun intended).