Hacker News new | ask | show | jobs
by thr0waway1239 3570 days ago
Speaking of which, it seems possible for a computer to detect content which is just mostly marketing, versus content which is not (based on how spam filters work). The search engine should just show a "marketing index" score right next to the result. Even better is to whitelist certain sites (Wikipedia,popular .edu and .org domains) to begin with and prioritize those results.

It would likely be really niche, but it could become the anti-Google, which would be great when people actually seek an alternative to all the noise.

I think it has to be a non-profit like Wikipedia itself, I cannot imagine a model where it can also make money. The submitted site is a candidate but it has to improve the search quality as other commenters pointed out.

2 comments

> seems possible for a computer to detect content which is just mostly marketing

But based on the spam race, marketers will then tune content so that it doesn't trip those filters. Paid news and journal articles, etc.

Extremely relevant XKCD:

https://xkcd.com/810/

But yes, even here on HN there are problems distinguishing between legitimate articles and paid news.

I agree. Its a pretty tough problem, however it is good to cross the bridge when it comes. If the search engine stays really niche, perhaps it may not even be worth it for the spammers, while doing enough to cater to the somewhat self-selecting audience. For example, the number of people who want to get to the front page of HN is likely to be a really minuscule fraction of people wanting to get to the top of search results.

Also, I wonder if is it possible to detect promotional content by analyzing things like call to actions and such?

Interesting that if this were a parsing problem (i.e. https://news.ycombinator.com/item?id=12478538), folks would immediately suggest accepting known good output, instead of trying to blacklist specific problems. The analogue in search would be something that looks more like a directory than an internet wide search engine.

Of course what killed directories in the early web is that they had no hope of scaling.

If they hide their sell links, shipping baskets, closing pages and such then they'll kill the sales though.

You can have paid news, but it's not doing anything if the mark can't buy the product afterwards because you had to remove all associations with selling to get the news to rank.

It seems that you could just use Google's algorithms and modify the site trust metric using a front-page spam-score, whilst reducing the effect of link-juice from links with associated marketing keywords ("buy the doohickey on this link", or whatever).

Keeping marketing sites high in your SERPs would make you way more money on referrals though.

Solving algorithmic tasks by just building a ML model of your competitor's algorithm seems like a funny way to start. I imagine this to be the way "programming" will stop being a thing in a few hundred years.

At the moment for web search it probably would not work, because I imagine from feature extraction to result there are several models involved to create intermediate results.