| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by iranu 88 days ago

Thanks for the comment, great question.

Quick clarification: the AI agent writes the config once and is out of the loop after that. You run crawls yourself or via cron. So the "auto-regenerate and silently get wrong data" scenario doesn't quite apply since there's no agent in the runtime loop.

But configs going stale is a real problem. Two things help:

1. The agent tests on 5 real pages before saving any config. Empty fields = rewrite before it hits production.

2. `./scrapai health --project <n>` tests all your spiders and flags extraction failures. We run it monthly via cron. Broken spider? Point the agent at it, it re-analyzes and fixes.

The gap: result count drops (your 500 to 450 example). Health checks catch broken extraction, not "fewer pages matched." We list structural change detection as an open contribution area in the README.