Hacker News new | ask | show | jobs
by nlohmann 1465 days ago
Have you every played with SQLite virtual tables (https://sqlite.org/vtab.html) - they could allow to provide an SQLite interface while keeping the same structure on disk. Though it requires a bit of work (implementing the interface can be tedious), it can avoid the conversion in the first place.
2 comments

Good point. Actually CommonCrawl provides Parquet files for their archives too.

And there's this vtable for Parquet extension. https://github.com/cldellow/sqlite-parquet-vtable

But for my use case virtual would be too complicated.

DuckDB would probably be a way better option and works amazingly well on top of parquet (https://duckdb.org/docs/data/parquet)
Then again, do you need virtual tables? The .warc structure won't change, so the tables won't change. But you can have SQL views defined instead for common queries.