Hacker News new | ask | show | jobs
by sly010 2333 days ago
I have been using SQLite as a format to move data between steps in a complicated batch processing pipeline.

With the right pragmas it is both faster and more compact than JSON. It is also much more "human readable" than gigabytes of JSON.

I only wish there was a way to open an http-fetched SQLite database from memory so I don't have to write it to disk first.

7 comments

> I only wish there was a way to open an http-fetched SQLite database from memory so I don't have to write it to disk first.

The sqlite3_deserialize() interface was created for this very purpose. https://www.sqlite.org/c3ref/deserialize.html

If the language's sqlite bindings don't offer a way to load a database from a string, if you're on a modern linux kernel (3.17+) you can make use of the memfd_create syscall: it creates an anonymous memory-backed file descriptor equivalent to a tmpfs file, but no tmpfs filesystem needs to be mounted and there's no need to think about file paths.
You can use the memvfs module to load in-memory databases if you're using the C API. I'm not sure how many higher-level APIs support it though.

[1] https://stackoverflow.com/a/53453338/3063 [2] https://www.sqlite.org/loadext.html#example_extensions [3] https://www.sqlite.org/src/file/ext/misc/memvfs.c

A very interesting approach is sqltorrent (https://github.com/bittorrent/sqltorrent): the sqlite file is shared in a torrent, and all queries will touch a specific part of the file, which is downloaded on-demand.

Also check https://github.com/lmatteis/torrent-net

Incredibly odd, but so awesome

  $ mount -t tmpfs none /some/path
  $ write db.sqlite /some/path/db.sqlite
  $ read db.sqlite

We've been abusing tmpfs for more than 10 years to get around the IO layer's failings. It's probably still a valid pattern.
This is a amazing, I think you may have just solved and headed-off a huge number odd problems for me.

Could you talk more about what Pragmas you’ve been using and why?

Not the OP, but I find `PRAGMA synchronous = OFF` makes the creation of DBs vastly faster ...
> I only wish there was a way to open an http-fetched SQLite database from memory so I don't have to write it to disk first.

Ramfs?

tmpfs is the better-behaved option should you run out of resources, see:

https://www.jamescoyle.net/knowledge/951-the-difference-betw...

I'm still remembering old-school ramdisks under Linux which were finite in both number and size, both to quite small extents. I think there were 8 (or 12 or 16?) total ramdisks available, of only 2-4 MB each, configurable with LILO boot options.

That's now ... mostly taking up valuable storage in my own brain for no useful effect.

It looks like a good intro, thanks. I wasn't aware of these technologies, but I knew it was possible to build an FS in RAM. So I just put these two keywords together.
FWIW, I learned a few things researching my answer.

(A prime validation for answering questions, BTW.)

My first read was that the old-school ramfs / ramdisk limitations still held. I can't actually even find documentation on them, though I'm pretty sure I'm not dreaming this.

Circa 2.0 kernal IIRC, possibly earlier.

OK, some traces remain, see:

https://www.tldp.org/HOWTO/Bootdisk-HOWTO/x1143.html

Note that this is OBSOLETE information.

What pragmas do you use? It sounds amazing!