|
|
|
|
|
by mikepavone
124 days ago
|
|
As someone with a self-hosted Mercurial instance dealing with this, I will say that the big names (OpenAI included, but not exclusively them) generally at least use proper user-agents and respect robots.txt, but they are still needlessly aggressive compared to traditional search indexers. There are also scrapers that are hiding behind normal browser user agents. When I looked at IP ranges, at least some of them seemed to be coming from data centers in China. |
|