Hacker News new | ask | show | jobs
by lbutler 1658 days ago
SEEKING FREELANCER | $2,000USD bounty - Work complete, and bounty donation at users request

I'm working on an open-source library in a niche civil engineering field. I have a proprietary file format that I would like to decode and support.

I am almost certain the file is an SQLite database with the header changed, though my attempts to reconstruct the header and find any other changes have failed.

If you think you are up to the challenge, I've linked to some example files below, the first person who can share the procedure to update the file that it will open in a native SQLite reader I will pay a $2,000USD bounty or donate to a charity (your choice).

Email in profile, I'll post below if an answer has been accepted.

edit: This has been decoded and the bounty will be donated! Waiting for confirmation from the person who provided the script to decode the file and I will share results and proof of donation

2 comments

Many thanks to tsholmes for finding a solution and sharing it with me.

He has requested the full amount be donated to Nine Lives Foundation, proof of donation can be seen here: https://ninelivesfoundation.networkforgood.com/

I'll be posting the script and the format documentation shortly

I would be very interested if you or tsholmes could discuss the thought process behind identifying and reverse engineering the format!
Sure thing!

The sqlite file format spec (https://www.sqlite.org/fileformat.html) is fairly short. I looked at hexdumps of where the db header should be, and a few page headers, and it seemed to line up, but not everything was filled in correctly. Trying to patch the header fields to match the sqlite spec and running it through sqlite3 didn't work, so I figured there was some more structural differences.

Since the spec is pretty short, I wrote a small parser for the sqlite files, printing out hexdumps of various sections as I went to check against the spec. The cell pointers in the btree leaf pages were missing, so I just tried reading them out in order from the cell data section and that worked (the pointers are really just an optimization to skip over cells when scanning). The table schemas are stored differently, but its a text format so its pretty easy to read (comma-separated columnName:type).

The record format was the most different part. Instead of type-tagged values like in sqlite, these records just had the column count and offsets of values. The offsets can be 1 or 2 bytes depending on the length of the record, which tripped me up for a bit. This was mostly figured out by looking at hexdumps of records. The values have to be interpreted based on the schema of the table the record is in.

The file format used little-endian for the record values, despite everything else in the file being big-endian. The schema in these files allowed array-of-vals and array-of-structs column types that I haven't decoded yet, but they don't seem to be on any of the important tables, mostly file metadata tables. There are some untyped binary columns containing an array of 64-bit floats, which was found by comparing against the csv of the same data.

Not counting some of the binary columns which I haven't decoded yet, the alternate version of the various sections were still pretty simple, so I was able to figure them out by eyeballing it. There weren't any numeric enums (outside of the ones in the sqlite spec) or flag masks that would have taken a while to decipher.

I had been working on this myself! I haven’t dug into SQLite to this level before, but found the header easy to correct. I got stumped when eyeballing the additional pages of information.

I really enjoyed learning more about b-trees and binary encoding formats taking this on. My primary exposure to those concepts has come from reading Designing Data-intensive Applications — so having something concrete to work on to test that knowledge was fun.

I’m excited to review your script. Great work!

I've had confirmation from a couple of sources now that this is indeed a SQLite file, or at least a derivative of it, with some key information missing.

If you email me how you know it is a SQLite file and what is missing, I can also send you the data that is contained within the files if that will be helpful to you.