|
|
|
|
|
by eknkc
706 days ago
|
|
Is there a good general purpose solution where I can store a large read only database in s3 or something and do lookups directly on it? Duckdb can open parquet files over http and query them but I found it to trigger a lot of small requests reading bunch of places from the files. I mean a lot. I mostly need key / value lookups and could potentially store each key in a seperate object in s3 but for a couple hundred million objects.. It would be a lot more managable to have a single file and maybe a cacheable index. |
|
That’s… the whole point. That’s how Parquet files are supposed to be used. They’re an improvement over CSV or JSON because clients can read small subsets of them efficiently!
For comparison, I’ve tried a few other client products that don’t use Parquet files properly and just read the whole file every time, no matter how trivial the query is.