Hacker News new | ask | show | jobs
by maxlin 1087 days ago
It already was semi-hard to machine-read, that is the reason I use Nitter for doing my small-scale continuous scraping of twitter which is now temporarily broken. Nitter is tons easier to parse as it's not reliant on JS, etc, and simpler to create screenshots of with headless chrome.

However if you mean implementing some even worse obfuscation (kind of like FB putting parts of words in different divs etc) that is not really compatible with the situation that this needed to be done as more of a temporary emergency measure. And PoW doesn't sound reasonable because it sets mobile devices against the scraper's servers. If all of this was just so easy, scraping would be dead. Good that it isn't.

1 comments

> And PoW doesn't sound reasonable because it sets mobile devices against the scraper's servers.

Scraper servers and mobile devices have different access patterns though. I I'm reading tweets then I'm fine waiting 1 second for a tweet to load. Page load times for this kind of bloated stuff are super slow anyway, meanwhile my mobile could spend a second or two on some PoW. But if you want to large-scale scrape, you suddenly have to pay for 1bn CPU seconds. And this PoW could even keep continuously increasing per IP. 0.1% with every tweet. Not noticeablr for the casual surfer sitting on the toilet, neck-breaking for scrapers.

> If all of this was just so easy, scraping would be dead. Good that it isn't.

Small-scale scraping could still be provided through API access or just a login.

The reason they are not doing the "easy" thing is that they don't see a need (yet, perhaps). Just get an account, they'd say, and they are right. It works for Instagram too, except for some weirdos who nobody really cares about.

Of course the scraper would have to pay too. But it makes for a race between how much they are willing to pay, versus how much worse the experience gets for real users. And for successful mobile apps, reducing average load even during active use is important (example: idle games that don't want to make your phone a drying iron, companies invest in custom engines and make all kinds of compromises to avoid this). And burst-allowing rate limiting is something I'm quite sure was already in place, especially with prejudice towards datacenter/VPN IP's. But similarly to how it is with search engine scraping, professional scrapers already have costly workarounds for these.

>The reason they are not doing the "easy" thing is that they don't see a need (yet, perhaps).

This argument just doesn't make any sense. Twitter notes that this is hurting them. Previews in chat apps, just clicking links in non-loggedin contexts is are broken. I feel like you just predict that this will turn out to be more accepted in the near future and become more a more permanent decision, which you don't like.

Im not fine waiting 1 second.

Most baffling is mobile reddit, where it takes like 6 seconds to load. Do they want us to use their crappy app, or they just dont care?

They're acting like they're desperate for you to use their crappy app.
They’re pulling every underhanded trick in the book to try and force mobile users onto the app. Yeah, I think they want you to use the app.
You can still get a login and have no delay.

For non-auth use, I rather wait for 1 second than not have any access at all. Which is the current state of affairs.

Maybe that is already the PoW anti-scraping measures haha.