|
|
|
|
|
by golergka
491 days ago
|
|
I'm building a service which needs to extract rss feeds from pages (hntorss.com if you're interested). Nothing else. From any rational point of view, website owner would actively want this parser to work as easily as possible — the whole point is for users to see the content you publish! Alas, I still get rate-limited, 400-ed and others because of user agent and other bot-detection mechanisms. |
|
no, the whole point (for most sites) is to make money off the users visiting said site (currently via advertising).
Another third party service which slurps the data, and redirect the users to a different site to consume the data means the original site lost the revenue, but paid the bandwidth cost.
So it's understandable that many sites want to block such agents.