| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sly010 2333 days ago

I have been using SQLite as a format to move data between steps in a complicated batch processing pipeline.

With the right pragmas it is both faster and more compact than JSON. It is also much more "human readable" than gigabytes of JSON.

I only wish there was a way to open an http-fetched SQLite database from memory so I don't have to write it to disk first.

7 comments

SQLite 2333 days ago

> I only wish there was a way to open an http-fetched SQLite database from memory so I don't have to write it to disk first.

The sqlite3_deserialize() interface was created for this very purpose. https://www.sqlite.org/c3ref/deserialize.html

link

paulfurtado 2333 days ago

If the language's sqlite bindings don't offer a way to load a database from a string, if you're on a modern linux kernel (3.17+) you can make use of the memfd_create syscall: it creates an anonymous memory-backed file descriptor equivalent to a tmpfs file, but no tmpfs filesystem needs to be mounted and there's no need to think about file paths.

link

palotasb 2333 days ago

You can use the memvfs module to load in-memory databases if you're using the C API. I'm not sure how many higher-level APIs support it though.

[1] https://stackoverflow.com/a/53453338/3063 [2] https://www.sqlite.org/loadext.html#example_extensions [3] https://www.sqlite.org/src/file/ext/misc/memvfs.c

link

rakoo 2333 days ago

A very interesting approach is sqltorrent (https://github.com/bittorrent/sqltorrent): the sqlite file is shared in a torrent, and all queries will touch a specific part of the file, which is downloaded on-demand.

Also check https://github.com/lmatteis/torrent-net

link

kevas 2332 days ago

Incredibly odd, but so awesome

link

0xbadcafebee 2333 days ago

  $ mount -t tmpfs none /some/path
  $ write db.sqlite /some/path/db.sqlite
  $ read db.sqlite

We've been abusing tmpfs for more than 10 years to get around the IO layer's failings. It's probably still a valid pattern.

link

FridgeSeal 2333 days ago

This is a amazing, I think you may have just solved and headed-off a huge number odd problems for me.

Could you talk more about what Pragmas you’ve been using and why?

link

jjgreen 2333 days ago

Not the OP, but I find `PRAGMA synchronous = OFF` makes the creation of DBs vastly faster ...

link

fctorial 2333 days ago

> I only wish there was a way to open an http-fetched SQLite database from memory so I don't have to write it to disk first.

Ramfs?

link

dredmorbius 2332 days ago

tmpfs is the better-behaved option should you run out of resources, see:

https://www.jamescoyle.net/knowledge/951-the-difference-betw...

I'm still remembering old-school ramdisks under Linux which were finite in both number and size, both to quite small extents. I think there were 8 (or 12 or 16?) total ramdisks available, of only 2-4 MB each, configurable with LILO boot options.

That's now ... mostly taking up valuable storage in my own brain for no useful effect.

link

fctorial 2332 days ago

It looks like a good intro, thanks. I wasn't aware of these technologies, but I knew it was possible to build an FS in RAM. So I just put these two keywords together.

link

dredmorbius 2332 days ago

FWIW, I learned a few things researching my answer.

(A prime validation for answering questions, BTW.)

My first read was that the old-school ramfs / ramdisk limitations still held. I can't actually even find documentation on them, though I'm pretty sure I'm not dreaming this.

Circa 2.0 kernal IIRC, possibly earlier.

OK, some traces remain, see:

https://www.tldp.org/HOWTO/Bootdisk-HOWTO/x1143.html

Note that this is OBSOLETE information.

link

DelightOne 2333 days ago

What pragmas do you use? It sounds amazing!

link