Hacker News new | ask | show | jobs
by MaknMoreGtnLess 1574 days ago
Would this be possible using SQLite:

- I point it to a bunch of JSON files on disk. They have similar schema but not exact (slight variations)

- I SQLite import them into one virtual table (I would rather not literally import them - think PostgreSQL JSON FDW)

- Then index certain fields so looking up records by certain fields is very fast (faster than having to "FTS" through all the files)

- Allow fuzzy text search on certain fields (say a field was company name and other fields were city, street and human names)

All the while (best case) not actually having to import the files into the DB (it's ok if the indices need to be rebuilt everytime)

3 comments

There's a SQLite virtual table mechanism for reading from CSV files on disk but I haven't seen the equivalent for JSON. In any case provided you have the disk space importing JSON is fast and the single resulting .db file is easy to tolerate in my opinion.

Unless you're talking about 10+GBs of JSON I'd recommend importing them and seeing how far you get.

Once the JSON has been loaded into tables the other things you want to do should all be very feasible.

- Fuzzy text search can be done using SQLite trigram indexes https://www.sqlite.org/fts5.html#the_experimental_trigram_to...

- I'd split JSON columns that you want to index out into indexed regular columns - there are a bunch of tricks in sqlite-utils for doing that, see https://simonwillison.net/2021/Aug/6/sqlite-utils-convert/

Do you think unqlite would be a better fit?

Have you looked at unqlite?

Not really. I mean, yes, with unlimited effort you can but you'd have to write your own virtual table implementation. There is no existing JSON virtual table module. You'd have to do the indexing yourself in your virtual table implementation; it's not possible to add an SQLite index on a virtual table. If you imported the JSON data into SQLite tables you still can't do it, because SQLite doesn't support GIN indexes. You would have to extract the data from the JSON into separate columns and then index those columns. Now you've got an index, but you still can't do a fuzzy text search on it. You'd have to use FTS5, or create yet more columns with something like a Soundex of the original value. I don't think SQLite is a good fit here. PostgreSQL is the clear winner.
Do you think unqlite would be a better fit?

Have you looked at unqlite?

Yes! I tried this exact usecase using sqlite-utils as a python lib.

I basically made a python script that open each of the json file and insert it into a sqlite inmemory db using sqlite-utils insert.

Then you have a regular sqlite db (in memory) that you can work it!