| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by incongruity 4068 days ago

The key here is developing good points of convergent measures - not all of which would have to be valid for any one review to be weighted (think a Bayesian scoring approach?)

- Age of account

- Account engagement (patterns matter too)

- Check-ins via a mobile device at the business

- Check-ins at other near-by businesses (patterns matter too)

- Partner with a Credit Card company to offer Yelp-reward bucks to encourage reviews, track usage and validate reviews (and reviewers) [Use this to feed into the engagement score, above]

- Partner with OpenTable (they do this) & prompt reviews after attendance (they do this) – weight these reviews more heavily. [Use this to feed into the engagement score, above]

- Let me actually identify myself to Yelp or to the world (or to just business owners, ONLY if I want to [Use this to feed into the engagement score, above]

- Reviews of similar businesses (e.g.: I like thai food) -- patterns matter here. Do I rate all competitors poorly, etc? Did I post all reviews in one day, etc?

- Do my reviews vary significantly in a systematic way from others in a category? (This shouldn't be enough on its own, but variance might mean something)

- Do I post photos of the place? (Factor into engagement score, above -- but if it's only for one business, it might be a flag)

etc., etc.

Really – a statistical model shouldn't be that hard to do -- maybe processor intensive, but hard? It doesn't feel hard, given all the data they're sitting on...

2 comments

nerfhammer 4067 days ago

Work has been done on this. Apparently you can detect fake reviews to some degree using text features alone:

http://www.cs.uic.edu/~liub/FBS/fake-reviews.html

link

derwiki 4068 days ago

I'm pretty sure the review filter (that everyone loves to hate) works this way.

link

incongruity 4068 days ago

I've assumed it's something like that - but it shouldn't be too hard for them to be a little more forthcoming to explain their methodology a little bit without compromising the value of it.

link