Hacker News new | ask | show | jobs
by pavedwalden 2409 days ago
Originally, Google was very effective at looking at the interconnections of content people had put online and using that to infer which pages were most relevant. SEO tactics immediately started gaming this system to create false signals of relevance, but for many years Google did an impressive job of staying ahead of that game.

I think what finally killed their search quality is the fact that there's no longer a public human-curated network of websites to draw meaning from. Most content on the web is bulk-generated crap, personal blogs and websites are rare, and many passionate hobbyist communities are hidden from crawlers in places like closed Facebook groups.

5 comments

I wonder what the next search engine algorithm will be... I've been thinking about up-votes/down-votes. And I could swear Google experimented with Random ranking a few years back, which gave very good results, but it was a bit annoying when you forget to bookmark a site, and not finding it when searching for it again the next day. But I'm thinking random plus voting, so if you find a good site, you upvote it, if you find spam you downvote it. If you want to find that good site again, you just check your upvote history. Random sorting give small interesting web sites a chance to get found. When doing a search you can use sliders to fine-tune the order using popularity (upvote/downvote ratio), last updated, first found, region, and a checkbox to only show links you haven't already clicked on, or haven't already voted on. And there could be an option to discover sites you would probably like, based on which links you have upvoted, and what people that upvote sites simlar to you also up-voted - but you have not yet clicked on. The problem with upvote bots have already been solved by social networks like Facebook, where using like-farms, follower-farms etc are not very effective. You should not be too baised on the votes though, or you would end up with echo-chamber loops like with Spotify and Youtube. The key is randomness! A random brute force-like algorithm is often just as good and sometimes better then a sophisticated system. It would be hard to game/cheat a random algorithm.
Google already has the equivalent of upvoting without requiring anything from the users.

If you click into a site and you don't return to your search, they consider that an upvote. If you continue clicking different search results they consider that a downvote for the site you visited.

Then that's not great, based on the way I (and, from observation, many others) use search results. I often middle-click (or control+click) several links which look promising, loading them in new tabs in the background. When I have a decent number, I then go visit the sites themselves.

In addition, the best Google searches result in finding the answer directly in the preview of the page from the search results. Unfortunately there's not a clear signal to Google which site preview provided the answer in this case, or even if none of them did and I just gave up searching.

In reality, I'm sure Google's determination of which site was relevant is much more sophisticated, likely involving some machine learning.

Since google's spying on you, you'd think they'd be able to link the page you're reading to the search that triggered it, and filter the analytics back to the search. But google won't use their spying to our benefit...
This is a really noisy signal.
All of this almost has me wishing for a human curated list of high quality sites for each topic, though that would obviously have its problems, too.
Yeah, I remember talking to a student over at our local college a few years back who had started doing that with a classmate. They called it Yet Another Hierarchical... something or other. Jerry, I think his name was. I probably should have joined them. It turned out to be quite an experience.
I'm building https://learnawesome.org for this. Started by collecting various awesome lists and then adding search, reviews, related items etc. For eg, when looking up a book, you can easily discover the TED talk or podcast given by the author which has the same ideas.

It's early days though. I imported around 10,000 MOOCs from various platforms and organized them by topics. The webapp is open-source so fixes/features are always welcome. :)

I remember dmoz! Nowadays any curated list would need to have the properties of (a) being publicly editable and (b) being prose, because for the time being it's sufficiently difficult for machines to write prose. Or at least, I assume so, since everything popular except wikipedia has been gamed.
That's what search engines used to be in the early days of the internet.
How do you stop the bots from upvoting? If you make people fill out a recaptcha every time to upvote they just won't bother.
Maybe make only people logged into their google accounts be able to vote? Surely that should stop most bots
An anonymous, decentralized reputation system which any service could hook into.
Days later, the HN post "a virus ruined my virtual reputation and I'm now banned from the internet".
That's the secret sauce behind decentralization. Damage from any one actor, including yourself, is limited both temporally and in scope. Each service would get a unique hash relevant only to them and giving them access to only their scope, and you would keep your private keys safe just like you would SSH keys, crypto wallet keys, or any other sensitive data. If you can't be bothered to do that, then a minor hit to your reputation is deserved from a philosophical standpoint, even if we work to make the system robust and resilient.
Including the spammers?
The idea is that it would take a long time and lots of effort to build a credible reputation.

Someone just beginning to use the internet would either need to slowly build reputation in the most open spaces, or get one or more people to vouch for them through an API.

I want to make it fiscally infeasible to run a mass persona farm.

> I've been thinking about up-votes/down-votes.

I don't have proof but I'm sure the amount of votes on eg. reddit (or perhaps how long it was on the front page) has an effect on search results; see the Comcast swastika[0] and Trump showing up when searching 'Idiot'[1].

0: https://old.reddit.com/r/funny/comments/403brr/10_months_lat...

1: https://thehill.com/policy/technology/397920-activists-manip...

I don't think it actually has anything to do with upvotes, but is just an indirect effect since highly upvoted pages are generally more trafficked, and it's the high traffic that really matters.
I would agree with this if not for my recent experience with duckduckgo. While Google's search result quality has sharply deteriorated in recent months, duckduckgo's results have stayed flat for the most part, to the point where ddg's result quality is more often superior to Google. If it were a matter of SEO manipulation, wouldn't ddg's quality decline too?
Duckduckgo has an inherent advantage- nobody cares

They have a tiny percentage of the market, so there will be a tiny percentage of money invested in gaming their algorithms.

And also, why doesn't DDG grow in market share if Google's results are worse?

My hypothesis is that Google's results aren't bad - they are actually serving a set of results which satisfies the laymen, and it's only the technical, highly niche users with whom the results don't align.

DDG is growing exponentially, Google is just massive and it takes a while to grow big enough to notice https://duckduckgo.com/traffic
> human-curated network

I'm surprised they haven't brought this in with an option so people can rate or at the least flag sites. Source users feedback directly which would probably help wipe out results like Pintrest that dominate so many searches but people find unhelpful.

It would be great if they had an option where they have the drop-down for cache to say something like 'dont show this domain' or something that would let people flag sites that dont seem to offer value against search terms

Also on the 'computer' search I was amazed how everything was now in a left column and how much blank space on the right there was: http://prntscr.com/pw5iav - is that mobile optimisation overtaking desktop experience?

What are the business reasons against this? I so wanted the ability to hide specific domains from Google search, that I installed an extension, and then it stopped working. Luckily I was able to find a Tampermonkey script that does the same job.
Exactly. Almost three years ago, I wrote this:

[…]

What do I mean with “the threat of Facebook”? In the old days, before today’s large “social media” sites, people made their own web pages on places like GeoCities or on simpler social-media-like sites like LiveJournal, etc. Those sites all had content and linked to each other. This is the web which the Google search engine and its algorithm was meant to find things in, and it worked very nicely, as it took advantage of the links other people had made to your site as a proxy for relevance in search results for your site. People making small web pages about their favorite topics (with lots of links to other people’s pages, since information was hard to find) could slowly and easily transition into making larger and larger reference web sites with lots of information, thereby attracting lots of incoming links from others, which in turn enabled people to find the information using Google’s search engine.

Compare this to now. Firstly, people having a Facebook account have no place to simply place information, no incentive to simply make a web page about, say, tacos or model trains, because that’s not what Facebook is about. Facebook is about the here-and-now, and whatever is yesterday is forgotten. As I understand it, there is no real way, in Facebook, to make a continuously updated page with a fixed address for people to go to as a reference point about some subject, or at least people are not directed towards doing this as part of their online activity (as opposed to in the past, when it was basically the only thing which people could do). Secondly, this makes it so that people have no natural path going from using Facebook to creating a larger web site with information, and there are no smaller model train or taco Facebook “pages” which could have links to your larger site and thereby validate its relevance. Thirdly, even if this second point was false, Google could not use these Facebook pages, because Google cannot crawl them. These pages are all internal to Facebook, and Facebook has every incentive to not allow Google to crawl and search this information. Facebook would much rather people used their own site to search, and thereby gaining all of Google’s sources of income: User monitoring and advertising.

https://news.ycombinator.com/item?id=13295456

Facebook isn't the only one that killed it. Wikipedia did too, since there's a huge incentive to centralise all the information in one place. If you write one sentence in a wiki article, it'll be read by many, many more people than if you write ten pages on your own site. So now, if you want to write information that doesn't sit well in wikipedia, or express a perspective that is not mainstream enough to fit in wikipedia, then you can just do it in the facebook-hackernews-reddit "click and it's gone" osphere.

I'm increasingly of a mind to create an old school pre-blogging web page. That was the ideal of the web. I have enough content, it's just the time.

You’re too kind.

Google could penetrate all of that, if they wanted to.

When I search for a fairly specific thing and the first page of results don’t even include half the search terms, that indicates quite clearly they don’t give a fuck about anything other than ad revenue.

Set ya default search engine to DuckDuckGo.