Hacker News new | ask | show | jobs
by wenc 709 days ago
I was able to find all 10 Whole Foods in the City of Chicago in 22.6s with DuckDB. It's amazing! (there are tons more Whole Foods in the metro area, but it found the exact 10 in the city)

        SELECT 
            tags['addr:city'][1] city,
            tags['addr:state'][1] state,
            tags['brand'][1] brand,
            *, 
        FROM st_readosm('us-latest.osm.pbf')
        WHERE 1=1
        and city = 'Chicago'
        and state = 'IL'
        and brand = 'Whole Foods Market'
I'm sure there are ways to make this faster (partitioning, indexing, COPY TO native format, etc.) but querying a 9.8GB compressed raw format file with data (in key-value fields stored as strings) for the entire United States at this speed is pretty impressive to me.
2 comments

That's great. I was impressed with DuckDB's speed with the OSM data. I'm doing it with all the data, but I intended to have: - a single file that can be queried - a single file that can be easily compressed

When I index the entire United States, admittedly with a subset of tags, I can search in milliseconds. I'll take it.

1-2 seconds in Overpass which also uses OSM:

https://overpass-turbo.eu/s/1NTy