Hi HN, I'm the author. I originally built SiteOne Crawler in PHP+Swoole back in 2023.
Last year I rewrote it entirely in Rust — 25% faster execution, 30% lower memory,
and a single native binary with zero runtime dependencies.
The feature I'm most excited about is CI/CD quality gating. The idea is simple:
crawl your entire website after deploy and block the pipeline if quality regresses.
This crawls every page, scores it across 5 categories (Security, Performance, SEO,
Accessibility, Best Practices) on a 0–10 scale, and exits with code 10 if any
threshold is breached. Drop it into GitHub Actions, GitLab CI, or any pipeline
as a single binary — no Docker, no Node, no runtime needed.
Beyond CI/CD, it also does:
- Offline website archiving with a built-in HTTP server for self-hosting
- Full-site markdown export with deduplicated content (great for feeding to LLMs)
- Interactive HTML audit reports you can email via built-in SMTP
- Sitemap generation