| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pbz 5400 days ago
	"Why do you need to enter in a captcha to view the unfiltered reviews?" Maybe they don't want Google to see them...

2 comments

SoftwareMaven 5400 days ago

That's what robots.txt is for, not silly captchas. You put captchas for the people who ignore that file.

link

derwiki 5400 days ago

I think the idea is that given a large corpus of filtered and unfiltered reviews, you might be able to reverse engineer signals in the algorithm and game the system. If that's your end goal, you and the software you write is likely to ignore robots.txt directives.

link

pud 5400 days ago

Not all spiders honor robots.txt

link

progolferyo 5400 days ago

Plus captcha's are just such a stupid user experience anyway, if you want to avoid the robots problem, there are plenty of ways around captchas

link

jonknee 5400 days ago

Like? Yelp doesn't want the filtered reviews to be accessed in an automated fashion, that's what a CAPTCHA does. What are the other options?

link

progolferyo 5400 days ago

A quick and dirty solution could be to add something to the page using javascript after the page has loaded and only let the link work if that variable exists (and check the value of the key with the server, if you wanted to be more cautious). Not a complete solution, but a first step and invisible to the user (and a pain in the ass to a robot)

link

waitwhat 5400 days ago

A robot scraping yelp's deep data is going to be site-specific, and having to scrape another javascript variable is not much more than a slight speed-bump.

link

jrockway 5400 days ago

Scraping with a normal web browser is utterly trivial. Anything a computer can do, a computer can do. Hence CAPTCHAs.

link

soult 5400 days ago

Yelp does not just want to make scraping impossible, they seem to also be interested in making it harder for humans to view filtered reviews.

link