Hacker News new | ask | show | jobs
by incongruity 4068 days ago
The key here is developing good points of convergent measures - not all of which would have to be valid for any one review to be weighted (think a Bayesian scoring approach?)

- Age of account

- Account engagement (patterns matter too)

- Check-ins via a mobile device at the business

- Check-ins at other near-by businesses (patterns matter too)

- Partner with a Credit Card company to offer Yelp-reward bucks to encourage reviews, track usage and validate reviews (and reviewers) [Use this to feed into the engagement score, above]

- Partner with OpenTable (they do this) & prompt reviews after attendance (they do this) – weight these reviews more heavily. [Use this to feed into the engagement score, above]

- Let me actually identify myself to Yelp or to the world (or to just business owners, ONLY if I want to [Use this to feed into the engagement score, above]

- Reviews of similar businesses (e.g.: I like thai food) -- patterns matter here. Do I rate all competitors poorly, etc? Did I post all reviews in one day, etc?

- Do my reviews vary significantly in a systematic way from others in a category? (This shouldn't be enough on its own, but variance might mean something)

- Do I post photos of the place? (Factor into engagement score, above -- but if it's only for one business, it might be a flag)

etc., etc.

Really – a statistical model shouldn't be that hard to do -- maybe processor intensive, but hard? It doesn't feel hard, given all the data they're sitting on...

2 comments

Work has been done on this. Apparently you can detect fake reviews to some degree using text features alone:

http://www.cs.uic.edu/~liub/FBS/fake-reviews.html

I'm pretty sure the review filter (that everyone loves to hate) works this way.
I've assumed it's something like that - but it shouldn't be too hard for them to be a little more forthcoming to explain their methodology a little bit without compromising the value of it.