|
|
|
|
|
by accrual
907 days ago
|
|
Maybe TypeScript for a typed, familiar and easy to read/write language, and either an internal scheduler or an external one (e.g. cron). A file with per-website rules/scraping hints stored as JSON on disk or in a database, unless it's supposed to be dynamic/one ruleset to rule them all. If you need to go faster, you could retrieve the data (wget/curl/some lib) and pass it to some binary (C/Rust) for processing into a database at core speed. |
|