Hacker News new | ask | show | jobs
by tptacek 47 days ago
"Idle cost is that one lightweight SELECT per millisecond per database — no page-cache pressure, no writer-lock contention, no kernel file watcher in the mix."

I think (respectfully) the LLM that probably wrote this overshot the mark here because busy-polling a select does not actually sound better to me than a "kernel file watcher".

7 comments

"one lightweight SELECT per millisecond"

This reminds me of the teenager who told her dad that she was just a tiny little bit pregnant.

One cannot be a little bit pregnant. But a DB can be only a little bit in the RAM, and specifically in the page cache. SQLite can act exactly like that, and it's damn fast as long as it does not need to durably write a transaction. Polling once a millisecond could spend a few microseconds.

I wonder if using a tiny Redis instance, or even something like LevelDB would be even more efficient.

With the file-watch APIs is that you don't need to poll at all - free is better than cheap.
Thing of the battery!

(read that in the way of "think of the children!")

to me it sounds like they asked it to not make a kernel file watcher, and now it writes that into every comment everywhere, despite not even being in the implementation
Yup
Respectfully (thanks haha) - yeah probably right. Original intent was to use inotify type thing but i avoided per-platform differences at the outset. this was definitely a for fun project that blew up unintentionally and am working to harden/improve.

Love Fly.

One of the things people seem to forget is that SQLite itself polls every millisecond or so to grab a lock.

So yes, don't use this in a mobile device, or a server if you want to let the CPU enter a low power state.

Otherwise, a single thread doing this in an otherwise idle server, doesn't seem that terrible. And if it's not idle, inotify won't help you (need to query what changed afterwards).

Appreciated your input on the original thread as well. Maybe I should note this recommendation in the docs or something.
If you're not making any changes to the database, does the SELECT "kill" you?

And if you are making changes, don't you have to poll regardless after the file watcher wakes you?

For WAL mode, SQLite can probably satisfy this query just by inspecting some shared memory. But it is busy waiting, sure.

SQLite has a wal hook which calls you back every time a transaction is committed to the WAL. https://www.sqlite.org/c3ref/wal_hook.html
That only catches changes made by the database connection being "hooked."

This has a thread running in the background trying to catch changes made by other connections, potentially (I'm not sure here, but I suspect as much) in different processes that are modifying the same database.

good point. but ime and as seems to be widely understood writing from multiple connections is a bit of a minefield in SQLite. and afaik it still would be possible to have a hook on all connections you expect to be writing?
That wouldn't work across processes. And if you only care about in-process queuing then you might find it easier/faster to use another kind of storage or roll your own WAL.
i did a quick benchmark on this with a single db connection updating user_version in a tight loop with the wal_hook callback enabled.

on my crappy old i5 with the db file on /dev/shm it can do ~150k writes a second with the wal_hook callback called on every write. and this is using JS bindings to C++ so has some unnecessary overhead.

A prepared `PRAGMA data_version` is likely quite cheap to run because it hits the same page every time…

…but some other push-based IPC mechanism would be a lot more battery friendly

> one lightweight SELECT per millisecond

For the low, low cost of $1 per minute, you can also lease a supercar.

Yeah, I had the same instinct - this feels very much like a "nice idea" but the execution falls short. I mean - busily banging on sqlite like this? Shit at that point just use Redis.
For what it's worth, Kine (software that k3s uses to replace etcd with SQL databases) implements etcd watches on SQLite through polling[1]. The reason being that SQLite does not offer NOTIFY/LISTEN like MySQL and Postgres do. Ironically, Honkey attempts implementing NOTIFY/LISTEN through polling.

k3s has been running on my home server for about three years now (using the default SQLite backend), and there doesn't seem to be excessive CPU usage despite dozens of watches existing in the simulated etcd. Of course, this doesn't say much about Honker, but it's nonetheless worth pointing out that sometimes the choice of database forces one towards a certain design.

[1] https://github.com/k3s-io/kine/blob/648a2daa/pkg/logstructur...

With SQLite, you're basically funneled towards a single-writer / single-process design anyway ... in which case why not use a more traditional condvar + mutex rather than polling?
Are you trying to avoid sleep?
I'm not even saying it's unworkable, just, my intuition is not that the "lightweight per-millisecond select" is an optimal design.
Really might be in sqlite. I've learned to never trust my intuition about performance with that thing. So many times I've gone to "optimize" something and discovered that the naive hack way I had been doing it was faster anyway. It's built for this sort of bullshit.
Maybe, I'm really writing about the language on this page, not about the design (I responded about this upthread).
Oh, yes, I see what you mean now.
What's the CPU usage? Like 2%?

I had a manual fs polling thing a while back. It was ugly (low time budget, didn't wanna mess with the native watchers), just scanned the whole thing once per second. It averaged out to like 0.3% CPU.

Not elegant, but acceptable for my purposes! (Small-ish directory, and "ping me within a second or two" was realtime enough for this use case.)

If this stops the core being able to drop to a lower power state it can be whole multiples of power use on some devices.

Wake ups are death for mobile form factors, even if not really doing much work.

This is a pretty good argument against the way we do operating systems now, right?
Why? Most modern OSs are "tickless" - where there's no regular scheduling tick and it can sleep pretty much indefinitely if there's no work.
i mean, technically this is once per millisecond, so this would happen 1000x more. In your case due to the kernel overhead you would likely not even be able to do it (300% CPU?).

Either way this does seem like a very large overhead due to the fact that there's just no other way to do it without a deeper kernel integration which might be outside the scope of what sqlite is trying to do.

If the fs tree scanned once per second had 1000 files, it would be once per millisecond for a file.