|
|
|
|
|
by asphero
184 days ago
|
|
Interesting approach. The scraper-vs-site-owner arms race is real. On the flip side of this discussion - if you're building a scraper yourself, there are ways to be less annoying: 1. Run locally instead of from cloud servers. Most aggressive blocking targets VPS IPs. A desktop app using the user's home IP looks like normal browsing. 2. Respect rate limits and add delays. Obvious but often ignored. 3. Use RSS feeds when available - many sites leave them open even when blocking scrapers. I built a Reddit data tool (search "reddit wappkit" if curious) and the "local IP" approach basically eliminated all blocking issues. Reddit is pretty aggressive against server IPs but doesn't bother home connections. The porn-link solution is creative though. Fight absurdity with absurdity I guess. |
|
It should also be easy to detect a forejo, gitea, or similar hosting site, locate the git URL and clone the repo.