Hacker News new | ask | show | jobs
by simonw 463 days ago
OK the way you're publishing the data with Parquet and making it accessible through DuckDB is spectacular.

Your README shows R and Python examples: https://github.com/dfsnow/opentimes?tab=readme-ov-file#using...

I got it working with the `duckdb` terminal tool like this:

  INSTALL httpfs;
  LOAD httpfs;
  ATTACH 'https://data.opentimes.org/databases/0.0.1.duckdb' AS opentimes;

  SELECT origin_id, destination_id, duration_sec
    FROM opentimes.public.times
    WHERE version = '0.0.1'
        AND mode = 'car'
        AND year = '2024'
        AND geography = 'tract'
        AND state = '17'
        AND origin_id LIKE '17031%' limit 10;
2 comments

Lately duckdb is becoming a serious competitor for my use of datasette because it's eliminating a step for most of my workflows - converting csv to sqlite.

I've been thinking about how to swap it in as a backend for datasette (maybe as a plugin?) but it seems inherently riskier as it needs to at very least be able to read a folder to list all the csvs available for my usecase. If I could hook that up with its native s3 support I'd be unstoppable (at work)

I have a medium term ambition to make Datasette backends a plugin mechanism, and the two I am most excited about are DuckDB and PostgreSQL.
Thanks! I hadn't seen anyone do it this way before with a very large, partitioned dataset, but it works shockingly well as long as you're not trying to `SELECT *` the entire table. Props to the DuckDB folks.

Eventually I plan to add some thin R and Python wrapper packages around the DuckDB calls just to make it easier for researchers.

I blogged a few more notes here: https://simonwillison.net/2025/Mar/17/opentimes/
Nice! I know a couple of projects that have been using this pattern.

- https://bsky.app/profile/jakthom.bsky.social/post/3lbarcvzrc...

- https://bsky.app/profile/jakthom.bsky.social/post/3lb4y65z24...

- https://skyfirehose.com

Love this distribution pattern. Users can go to the Parquet files or attach to your "curated views" on a small DuckDB database file.