|
|
|
|
|
by gizmo686
763 days ago
|
|
I don't get what Bytedance is doing here. Clearly they are not actively trying to evade blocks, as they are idenifying their bot with a user agent sites can block. However, surely they have enough smart engineers there to realize that running a bot at full speed (and, based on other reports, completely ignoring robots.txt) will get them blocked by a lot of sites. If they just had a well behaved spider, almost no one would mind. Getting crawled is a fact of life on the internet, and most website owners recognize it as an essential cost of doing busses. Once you get a reputation as a bad spider, though, that is very hard to shake. |
|