| Hello, and thank you for the notes. Unfortunately, your points seem to be mostly wrong, so let me clarify them a little. Do not worry; many people misunderstand SQLite and its abilities. - Single implementation: Sure. Working with SQLite convinced me that nobody cared to reimplement it, as it worked so well that nobody wanted or needed to rework it. I may write an unpacker just to prove that it is not hard at all to read SQLite format. The complicated part is the SQL engine (and many other features that are not used in Pack), and for Pack, you can live without it. - SQLite does not require a disk. It has a memory option. Pack can have piping and properly will. I did not implement it because, well, it is too new, and I do what I feel is needed first. You can subscribe to the newsletter on the site (https://pack.ac/notes) or follow GitHub. - Of course you can read SQLite without reading the whole file. It is a database, not a tar file. - SQLite is highly optimized to read the lowest amount of data, and it has layers of smart caching. There is a reason it is used on almost any device that has a computer on it, even smartwatches. - Of course the archive is safe from changes in unpacking. It will be opened in read-only mode, guarded by the OS and file system, and Pack also uses code isolation, which prevents calling write on any file. - There are a lot of tools that help repair a damaged SQLite file. Pack too is guarded with transactions. The file will not get corrupted unless the disk goes corrupt; the mentioned tools come handy then. And in today's world of SSDs, the risk is shrinking rapidly. - On unpack, Pack reads, decompresses, checks, and writes in a multithreaded. So yes, parallel reading is possible and done in Pack. - I suggest trying Pack for yourself. It gives you the feeling you need to have to be sure. |
A less common tar's feature is packing on compress -- stuff like "ssh remote tar cvf - ... > local-file.tar", which skips temporary file on remote machine, and also saves lots of time in transfer.
But for both of those, sqlite's "memory" won't help you there - memory or not, you still need to have the entire file to read it. So if you just store file contents in the sql database, then you have to fetch everything up to the latest byte before you can get any data out.
Maybe you can have index in sqlite, and append data as-is... but where would you put that index?
if you put it in front (like squashfs), you need to produce entire metadata before writing first data byte.. and that should include compressed sizes too (assuming you want to support random extraction), which means you cannot stream file out until you finish compressing all the data. And also sometimes you will not be able to add files to the archive without rewriting the whole archive (if the index grows and you didn't leave enough padding). This might be OK, but definitely should be mentioned.
If you put it at the end (like zip), you will be able to stream file out during compression, but fast decompression would be impossible. Also, you'll forego any sqlite transitional guarantees - since the database will be created in-memory, and only written at the very end once all the files are written.
So frankly, I don't see how you can win on a streaming front, unless you really have a custom format and "sqlite3" is just a small part of it.
(Another problem is there is not even a short spec - how is sqlite3 used, what is your schema, and so on. And I am sorry, but I am not going to read the source code just to figure this stuff out).