| So something big got approved a few miles from my house -- a data center complex -- which I found out through a local news provider. The story sparked my curiosity, and I soon went down a rabbit hole of local city government websites and public data to see what other projects might be in the works. Then the thought occurred to me: what if I could just... scrape all of it? So one API led to another and another.... I ended up writing 200+ scrapers across 85 cities. It turns out that when the City of Columbus uses Accela, the City of Austin uses Amanda, the City of Chicago has its own thing, and half the other cities dump CSVs on an FTP server that may or not be online -- "just scrape it all" stops being simple quickly. Some things I learned along the way: - There is no standard for permit data. Every city invents its own schema.
- Geocoding more than a million addresses sounds straightforward until you come to the conclusion that half of the addresses are things like "LOT 4 BLK 2 UNIT 7".
- Government APIs have rate limits that appear to be set by someone that assumed no one would use them.
- The estimated cost field is a work of creative fiction. A $200 million data center will sometimes be listed at $1. PermitRadar is the result -- an interactive map + search across 1.6M+ results. You can lookup any city, filter by date/cost/type, and see what's going on. If you care about a specific address (homeowner, contractor, investor), you can setup alerts that notify you when new permits are filed. The city pages (e.g. /permits/los-angeles-ca) are server-rendered and public -- no login required. The stack is Express/TypeScript + Next.js + PostGIS + Redis + BullMQ. Scrapers run on a cron job and feed a queue that handles geocoding, normalization, and AI classification (Claude Haiku 4.5). I'm happy to answer any questions that you have regarding scraping, the data normalization hellscape, or anything under the sun. |