Hacker News new | ask | show | jobs
by accrual 907 days ago
Maybe TypeScript for a typed, familiar and easy to read/write language, and either an internal scheduler or an external one (e.g. cron). A file with per-website rules/scraping hints stored as JSON on disk or in a database, unless it's supposed to be dynamic/one ruleset to rule them all. If you need to go faster, you could retrieve the data (wget/curl/some lib) and pass it to some binary (C/Rust) for processing into a database at core speed.