Hacker News new | ask | show | jobs
by JackC 509 days ago
Hmm, I can put them here for now: https://source.coop/harvard-lil/data-gov-metadata

Unfortunately it's a bit messy because we weren't initially thinking about tracking deletions. data_20241119.jsonl.zip (301k rows) and data_20250130.jsonl.zip (305k rows) are simple captures of the API on those dates. data_db_dump_20250130.jsonl.zip (311k rows) is a sqlite dump of all the entries we saw at some point between those dates. My hunch is there's something like 4,000 false positives and 2,000 deletions between the 311k and 305k set, but that could be way off.

1 comments

Very cool! I take a look :)