| Interesting article. I don't understand the filtered review system at all. Beyond the 'he said / she said' complaints that occasionally come out, there are things about their system that simply don't make any sense unless Yelp is incompetent or slimy. For example: - When you post a review, you as a reviewer think its unfiltered forever. When you revisit the page as a logged in user and read a place that has your review, your review is visible. When you log out or log in as another user, the review is filtered and hidden. At the very least, it should tell you your review is filtered, I see no reason to pretend the review is not filtered when the review is legitimate. - When you view unfiltered results, the per page number mysteriously changes to 10 per page. I don't see any reason why this should change. Plus the results are pretty slow to load, quite slower than the results for filtered reviews. - Why do you need to enter in a captcha to view the unfiltered reviews? Why would they care if you were a bot only for the unfiltered reviews and not the normal reviews? I don't see the difference, unless they want to prevent people from writing scripts to pull in unfiltered review data. Plus the captcha is fucking horrible, literally half the captcha's I get are not readable and I need to refresh. - The filter algorithm seems to be clearly flawed and simply catches way too many reviews that should not be filtered. For example, take this user: http://www.yelp.com/user_details?userid=tZlbsUVo-8wtnR7oMa-3... . The guy has 11 reviews, 1 1-star review, 1 2-star review and nothing out of the ordinary and yet his review about Yelp was filtered. Why? His points in the review seemed legitimate. He seems to be a normal user, not a new user and posts reviews across the board (more good reviews than bad in fact). They should either fix the algorithm or be more transparent about why reviews are filtered because I can't understand why a review like that is filtered. |
Caching, I'm sure most unfiltered reviews are cached whereas filtered reviews are not and reaching out past the cache can be expensive. One way to mitigate this is to reduce the number of results you pull.
> Why do you need to enter in a captcha to view the unfiltered reviews? Why would they care if you were a bot only for the unfiltered reviews and not the normal reviews?
If you can write a script to deduce the filtering algorithm then you can by definition write reviews that thwart it. With less data, it is harder to deduce the filtering algorithm. In other words, a captcha thwarts high-volume review fraud.
> The filter algorithm seems to be clearly flawed and simply catches way too many reviews that should not be filtered.
I think most people seem to underestimate the difficulty of the problem. Unlike e-mail spam, which is easy for a human to spot, fake reviews are very hard for a human to spot. How can you tell if a consumer was provoked into writing a positive review so that they could get a few bucks off their order just from their writing? You can't, you can look at other statistical trends behind such reviews (such as a sudden wave of positive reviews), but you're only looking for side effects of the primary problem and thus you will never achieve perfect performance from a method like this.
Yelp takes the (somewhat philosophical) viewpoint that customers who are coerced into writing a review are less genuine than they would be otherwise. I believe that this view drives a lot of their algorithm and possibly threatens its accuracy in a way that is ultimately not worth it. I think there are a number of things that Yelp could do to make the users trust in reviews greater that don't involve filtering - one simple thing would be for a user's review of an Indian restaurant to show me that user's breakdown of reviews of other Indian restaurants.
TL;DR: This is a much harder problem than it seems at first glance, partly because of the nature of the problem and partly how Yelp has framed it for themselves.
Disclaimer: I used to work at Yelp, but no longer do. Everyone I worked with were stand-up guys.