Hacker News new | ask | show | jobs
by winrid 709 days ago
Depends on the machine :) hours maybe?
2 comments

So for a one-off query DuckDB is tons faster, and easier.
yeah, but it's not high performance, which was my original point.

I spend a lot of time optimizing stuff developers thought was "high performance" and they're scanning a 100gb+ dataset on every page load.

If the alternative takes hours, then ~30 seconds seems high performance to me.
DuckDB is indeed great for one off stuff.

But calling 30s high performance for quantifying 60k unique values like in this case is misleading at best. I try not to mislead people. :)

I can help answer this a bit.

Processing the data with code and SQLite takes about 4 hours for the United States and Canada. If I can figure out how to parallelize some of the work, it could be faster.

The benefit of using SQLite is its low memory footprint. I can have this 40GB file (the uncompressed version) and use a 256MB container instance. It shows how valuable SQLite is in terms of resources. It takes up disk space, but I followed up with the zstd compression (from the post).

Was this more work and should've I have used Postgres and osm2pql, yes. Was it fun, well yeah, because I have another feature, which I have not written about yet, which I highly value.