Hacker News new | ask | show | jobs
by pbz 5400 days ago
"Why do you need to enter in a captcha to view the unfiltered reviews?"

Maybe they don't want Google to see them...

2 comments

That's what robots.txt is for, not silly captchas. You put captchas for the people who ignore that file.
I think the idea is that given a large corpus of filtered and unfiltered reviews, you might be able to reverse engineer signals in the algorithm and game the system. If that's your end goal, you and the software you write is likely to ignore robots.txt directives.
Not all spiders honor robots.txt
Plus captcha's are just such a stupid user experience anyway, if you want to avoid the robots problem, there are plenty of ways around captchas
Like? Yelp doesn't want the filtered reviews to be accessed in an automated fashion, that's what a CAPTCHA does. What are the other options?
A quick and dirty solution could be to add something to the page using javascript after the page has loaded and only let the link work if that variable exists (and check the value of the key with the server, if you wanted to be more cautious). Not a complete solution, but a first step and invisible to the user (and a pain in the ass to a robot)
A robot scraping yelp's deep data is going to be site-specific, and having to scrape another javascript variable is not much more than a slight speed-bump.
Scraping with a normal web browser is utterly trivial. Anything a computer can do, a computer can do. Hence CAPTCHAs.
Yelp does not just want to make scraping impossible, they seem to also be interested in making it harder for humans to view filtered reviews.