| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zalebz 1594 days ago
	It potentially could be related to all of the "knock-off" websites that scrape StackExchange data. Maybe they are going outbound on various Tor nodes and getting the IPs blacklisted as a result of reading thousands of pages too rapidly.

1 comments

capableweb 1594 days ago

Given my experience with the network quality of Tor, I'd be surprised if scraping was A) efficient to do over Tor and B) that Stack Overflow would even notice it because as I said, the network speed is too slow, so can't add that much traffic compared to the absolutely staggering amount of traffic they get from non-Tor.

link

jks 1594 days ago

Also, what would be the point? It's easier to download a data dump from https://archive.org/details/stackexchange

link

raverbashing 1594 days ago

You would be surprised/disappointed at the amount of abuse the bigger sites have to handle

Things like this https://news.ycombinator.com/item?id=26072025

link

capableweb 1594 days ago

Interesting, but doesn't fit the context of Stack Exchange blocking Tor. Your example there is regarding a mobile app hotlinking a image of a flower, which seems easy enough to block/fix, while Stack Exchange blocking all Tor users from even reading Stack Overflow doesn't make so much sense.

link

rovr138 1593 days ago

1 image abused enough, they dig, find the culprit being an app.

Fixes, updating the app, blocking the image, blocking all requests with empty user agents.

------

1 person abused enough SO, they dig, find the culprit being someone using Tor network.

Fixes, identify the user and ask them to stop, block all traffic from Tor.

-------

Do you propose an alternative fix?

link

zalebz 1594 days ago

I wasn't aware that existed and obviously that would make any scraping utterly useless

link