Hacker News new | ask | show | jobs
by adityaathalye 5 days ago
Thanks for the benching, Anders! So grateful for the stuff you've shared over the years. Invariably, every single post has been useful and/or educational to me.

I read this post more as an illustration of the *value* of UUIDv7 as primary key, over integer primary keys, in lieu of minimal loss of read/write performance, and marginally more data on disk bloat.

SQLite's automatic integer rowID primary key is a no-brainer, when the SQLite application is local-only, such as application storage format (mobile and desktop). Or is never intended to grow beyond a single server instance. Basically, where each SQLite file is private to a singular instance of the application.

However, if there is even an outside chance of needing to cooperate across application instances, e.g. the minimal limit case of a personal knowledge base that should seamlessly sync across a person's devices, as well as a hosted service, then a high-quality sequential random ID starts to make a lot more sense. (No-brainer arbitrary table merges / splits / remerges, de-duplication, etc.)

Random ID primary key is a bad idea period, whether it be the UU kind or the SQ kind, or any other kind. As far as my DB knowledge goes, this class of ID destroys all tree-algorithms, and we are stuck with the fact that there is no practically better way, than an appropriate tree-structure, to group and organise a meaningful amount of data, efficiently and effectively.

2 comments

I've updated the article with the correct rowid alias (integer not int) so the rowid version is now 715ms. I've also added an example of rowid and a secondary index UUID4, and that also seems to be bad for performance (as although it's not a clustered index it's still random inserts into a b-tree).
Well, I expect to never need WITHOUT ROWID. And even if such an arcane situation hits my system, WITHOUT ROWID has so many ifs and buts that I'll probably elect to eat the $$$ cost of running an un-optimised normie SQLite as far as possible.

cf. https://sqlite.org/withoutrowid.html

> The WITHOUT ROWID syntax is an optimization. It provides no new capabilities. Anything that can be done using a WITHOUT ROWID table can also be done in exactly the same way, and exactly the same syntax, using an ordinary rowid table. The only advantage of a WITHOUT ROWID table is that it can sometimes use less disk space and/or perform a little faster than an ordinary rowid table.

As of now, I am doing the following in my (Bitemporal data system) experiment (When will it see the light of day? Nobody knows.).

All data are globally uniquely identified by a UUIDv7. However all tables have `rowid` integer primary key asc (which is just an alias for SQLite's autoincrement int id). The `rowid` is the basis for joins, and is the foreign key reference. This lets me offload some useful disambiguation work to the DB as well as have it enforce global (across data systems) record uniqueness guarantees, while retaining local (within process) query efficiency by retaining the ability to use integer rowids.

While the idealised insert performance in your bench is indeed mind-boggling, the DB Schema isn't doing anything CPU-intensive during inserts (checks, constraints, triggers etc.). My schema / query pattern yields comparatively meagre throughput, but I am happy with the ballpark it has landed in, given all the work I'm making SQLite do for me on each `assert!` and `redact!`.

cf. my dirty-but-useful-enough bench, with production-like record content:

A poor man's napkin-mathy, append-only SQLite write/read benchmark

https://gist.github.com/adityaathalye/3c8195dc70626b33c23867...

Summary:

  ;; Okay, I think I can live with this...

  ;; - "facts" table: 12M+ records
  ;;     - single process writes to it
  ;;     - ~ 400 transactions/second
  ;;     - append-only table, enforced via SQLite "before" triggers
  ;; - "now" table: 
  ;;     - updates on every assert/redact on "facts" table, via triggers
  ;;     - currently at "limit case": for each read it is empty, or very small, because writes do back-to-back assert/redact of the same fact
  ;;     - gets reads from two reader threads (evenly split)
  ;;     - ~41,000 reads/second
  ;; - all reads are concurrent with writes (poor man's futures)
Aside: Specific to SQLite...

Thanks to its oh so convenient automatic integer rowIDs, I believe one can amortise some of the other overheads of UUIDv7s for "in-between" queries, viz. indices, joins, ctes, virtual tables etc., with appropriate schema / query design.