Hacker News new | ask | show | jobs
by toomuchtodo 3077 days ago
You'd use a browser extension, scoped to requests of sites you're interested in, and stream your data back to your infrastructure for processing. You're limited only by your install base and your ingest infrastructure.

Recap [1] does this to extract PACER court documents that are public domain, but access is restricted due to draconian public policy.

[1] https://free.law/recap/