Hacker News new | ask | show | jobs
by ouEight12 1703 days ago
> The ranking of best VPNs on the site is mostly a ranking of VPNs that offer the largest referral fees [1].

Can't we replace "VPNs" with pretty much any service at this point though?

I haven't trusted 'review/ranking' sites in ages, because after see the same top 5 "best hosting providers ever!" lists one 3 sites, you kind of get a hint.

1 comments

As someone who has been working on an honest web hosting review site for a decade now. You're totally right. I see the same pattern in this article talking about fake review sites. The biggest offender in hosting was Endurance International Group who owned so many major brands and gobbled them up. You'd often find any ranking full of the brands they owned (BlueHost, HostGator, iPage, JustHost, Site5, Arvixe, etc, etc, etc).

Since you're really skeptical, I'd love to hear your take on what I've done (and been doing) in terms of trying to create an honest system.

The gist is, I scrape Twitter data, filter out spam, affiliate links, etc, and use sentiment analysis to see which brands people actually like. My hypothesis was that reviews are fundamentally a weird human behavior. The real 'reviews' are embedded in normal conversation when you talk to people. With enough data of these signals, you can get a much better picture of what people really think. The results seem to line up basically like an NPS measurement.

https://reviewsignal.com/webhosting/compare has all my data if you want to see how the rankings actually look. Not every company has an affiliate program. Many smaller companies aren't listed because I can't get enough data.

What has helped me a lot making better purchasing decisions on sites with a ranking system ala Amazon is only reading the 3 star reviews (or whatever is in the middle)

The 5 stars are somewhere between suspicious and just not that useful because of people being overly excited and the 1 star reviews is often people just having bad luck or not understanding what the product is and for whom.

Meanwhile the 3 stars I feel are the most sober ones, often pointing out flaws (and every product/service has them), that I can then make a more informed decision whether those flaws are going to affect me at all or going to be a show stopper.

That's why I'm a bit skeptical about the use of sentiment analysis or similar, independently of how well they work. I'm not necessarily convinced that excitement is actually that good a signal. E.g. there are many movies, books, etc that are generally well received but I don't like them at all. Doesn't make the other people or me wrong, I just have different expectations and preferences.

Similarly for tech services I would prefer having a much, much easier time being able to map the systems capabilities and limitations to my use case and budget than knowing whether other people like or dislike it.

Honestly I don't see a 5 star rating as being euphoric/excited about the product.

The 5 stars are basically baseline if the product is good enough for the price it's sold at. Giving anything besides a 5 star is a massive FU to the vendor, as dropping below 4 is basically a death sentence for the listing. That can be warranted for sure, but only if there was something very wrong.

The biggest issue I'd see with your approach is how hard it's going to be to separate bots talking to each other from actual people writing these messages.

Most research on the matter seems to conclude that anything between 25 and 70% are written by bots.

The high range is because it's actually quite hard to confidently assess wherever a message is written by a human. Surprisingly not because it's hard to classify bots, but because people often write borderline incoherent messages too.

I don't disagree with you, and I should have written better that I use them as part of an overall evaluation process.

As an example how I use them. Some time ago I was looking into buying a audio hardware unit for music production (advanced hobbyist use I guess). Superficially, from the marketing copy and some reviews I have skimmed it had everything I wanted, like Midi in/out connectors.

Then I went to the 3 stars section and one of the first comments said:

"Great device, but had to return it, because it doesn't support part X of the MIDI protocol."

Whether this is written by a human or a bot is irrelevant. What matters is, if it is true and if it affects me. In this specific case it simply didn't matter to me, so ignored the comment. In case it would have, the comment would have served as a red flag to do further investigation to see if the claim is true.

I don't do an elaborate process on everything I buy, especially not low cost every day dispensable items (just buy different brands over time until I stumble upon one I like), but the more specialized the use case and the bigger the buy in and cost of reversing decision, the more I wish I had better tools to figure out whether a product/service actually matches my use case.

> The 5 stars are basically baseline if the product is good enough for the price it's sold at

That is absolutely not my interpretation of such a scale. I would naturally map 5 stars to "exceeds expectations". If the best a product can do is meet expectations, how do you disambiguate from the truly excellent?

One of a multutude of issues with reviews is our differing interpretations of what a score represents

So, would you ever consider buying a product that has less then 4 stars?

Very few would, which means that giving anything but a 5 star effectively means "I don't want you to ever sell this product again"

In the early days of the “sharing economy” I gave a Taskrabbit cleaner 4 stars, thinking “they did a really good job!” But they called me nearly in tears asking what they had missed to get five stars.
Paid reviews seem easy enough to spot, they hype up the product/service too much, I'd read a VPN review, notice those canned phrases, and sigh because in my view the review site just scammed me of my time...

Maybe a bot that collects reviews and detect similar sentences can also rate those bullshit "review sites"..

For hosting, it's pretty clear based on who they put at the top and top few rankings. Is it one of the cheap, high affiliate hosts? Fake. Problem is, people unfamiliar with the space might not be aware of it. For people who know, web hosting reviews are generally crap.

As far as rating sentences of bullshit review sites. How do you think that would work and how would you train such a system? I'm worried about the non-paid training, where might one get enough sample data to show 'normal' vs 'paid'?

When a lot of the reviews use literally the same words or phrases cut-n-paste or templates in, it’s pretty transparently a scam.
Yeah, I pick that up as spam detection looking for identical type of messages/content.
This is awesome. I'd love to see this exact setup applied things other than web hosts!
If I’m reading your table correctly, you say Linode’s VPS plans start at $10/mo, but they’re actually $5/mo.

Is this meta-data, the stuff outside the rankings, hand-entered?

Yes and updated it, thanks!
I don't know how accurate your ranking method is - NLP is tricky - but the idea is very very cool!
Yeah, I custom wrote a lot of the NLP years ago. It was adapted from my Master's thesis predicting box office sales using sentiment analysis on twitter data (and a few other variables like number of theaters it plays in). Back then, nothing out of the box was accurate enough to my liking and I had to custom write a lot of stuff to analyze it and it's pretty web hosting specific for understanding context. I honestly haven't tested much on newest sentiment analysis stuff available. I wonder how accurate it would be. But the biggest problem I am finding is that there is less data on Twitter. Not sure if that's fewer people talking about these companies or just lower Twitter volume in general.