Hacker News new | ask | show | jobs
by immibis 588 days ago
You're thinking too much by the rules. You can absolutely scrape them anyway. Probably the biggest relevant factor is CGNAT and other technologies that make you blend in with a crowd. If I run a scraper on my cellphone hotspot, the site can't block me without blocking a quarter of all cellphones in the country.

If the site is less aggressively blocking but only has a per-IP rate limit, buy a subscription to one of those VPNs (it doesn't matter if they're "actually secure" or not - you can borrow their IP addresses either way). If the site is extremely aggressive, you can outsource to the slightly grey market for residential proxy services - for fifty cents to several dollars per gigabyte, so make sure that fits in your business plan.

There's an upper bound to a website's aggressiveness in blocking, before they lose all their users, which tops out below how aggressive you can be in buying a whole bunch of SIM cards, pointing a directional antenna at McDonald's, or staying a night at every hotel in the area to learn their wi-fi passwords.

1 comments

> You're thinking too much by the rules. You can absolutely scrape them anyway. Probably the biggest relevant factor is CGNAT and other technologies that make you blend in with a crowd. If I run a scraper on my cellphone hotspot, the site can't block me without blocking a quarter of all cellphones in the country.

I am familiar with most of that, and there is a BIG difference between trying to find a workaround for one site, that you scrape ocasionaly, than to to find workaround for all of the sites.

Big sites will definitely put entire ISP's behind annoying capachas that are designed to stop exactly this (if you ever wonder why you sometimes get capatchas that seem slow to load, have long animations, or other annoying slow things, that is why etc.)

And once you start making enough money to employ all the people you need for doing that consistently, they will find a jurisdiction or 3 where they can sue you.

Also good luck finding residential/mobile ISP's that will stand by, and not try to throttle you after a while.

You definitively can get away with doing all of that for a while, but you absolutely can't build sustainable businesses on that.

There are many rationalizations to not try.