|
|
|
|
|
by jiggawatts
702 days ago
|
|
> trigger a lot of small requests reading bunch of places from the files. I mean a lot. That’s… the whole point. That’s how Parquet files are supposed to be used. They’re an improvement over CSV or JSON because clients can read small subsets of them efficiently! For comparison, I’ve tried a few other client products that don’t use Parquet files properly and just read the whole file every time, no matter how trivial the query is. |
|
Duckdb can query a remote duckdb database too, in that case it looks like there is caching. Which might be better.
I wonder if anyone actually worked on a specific file format for this use case (relatively high latency random access) to minimize reads to as little blocks as possible.