Hacker News new | ask | show | jobs
by dumbfounder 4931 days ago
I had problems with people scraping Twicsy so hard that it was taking the site down. For a while I would manually review the top IP addresses requesting pages a couple times per day and look for patterns and ban IP's based on that. Then I created a script based on the patterns I recognized to do it automatically.

But then I just made Twicsy fast enough to deal with the traffic so I don't need to worry about it anymore. I guess it depends on your business model whether or not that will work for you.

1 comments

We've blocked a few of the worst, but mostly just added servers to deal with the load.

We actually found out who one of the worst ones was and contact them. It turns out it was a major legit proxy, but they had a bug in their proxy code that caused refetching of one of our urls over and over. They were very easy to work with and they fixed the bug.