Hacker News new | ask | show | jobs
by titzer 1066 days ago
After having used computers for a few years (oh, about 30), and having grown up with the internet (first dialup at age 16 in 1996), I gotta say,

Access to information should not have (ranked, optimizable, commercially-bent) search as its base interface. This is a cosmic, civilization-level screwup. Not only do you get the decades of SEO and advertising, but the whole system becomes lossy over time. You can't find what you used to anymore.

4 comments

What's the alternative? Manually curated directories lost to search. Anything completely manual isn't going to be "web-scale". Anything algorithmic is ranked and optimizable.
Wikipedia is a pretty decent, manually curated repository of encyclopedic information. Google search gets worse and worse every year but Wikipedia is pretty stable over time.

Sure, it has its own problems with edit wars and editor politics, but compared to the cesspool of SEO spam on Google, Wikipedia is an information paradise.

Wikipedia is a barest, insignificant smidge of the knowledge that's available on the internet. Yes, it's incredibly useful, but no, you can't stay within its bounds of knowledge for any appreciable time if you're actually trying to accomplish something beyond falling into a Wikipedia hole.

For manually curated content, Google and other search companies were solving that in 1998 because manual curation wasn't feasible for the amount of content on the internet. It's been 25 years since then, and we're not exactly producing less content online.

To further support my argument in favour of manual curation, I would like to ask: why, for the past few years, have people ‘in the know’ been appending “Reddit” to their search strings on Google? Because Reddit is the largest repository of manually curated links, discussions, and original non-encyclopedic information on the internet.

People want manually curated information. They don’t want automation which by its very nature, as a fixed set of parameters, can and will be gamed.

I feel like Reddit is a bad example for an argument against search, because it still relies heavily on search. People aren't finding a specific subreddit for their query and then browsing it until they find what they're looking for, and they're usually not even searching directly on Reddit (because of the poor search quality), they're searching on Google.

What does "manual curation" mean in this context? I think it really just means "absence of spam/low quality content", which manual curation is not strictly necessary for.

Yahoo, back in the days before Google, was a manually curated search engine. They had a team of people doing the indexing work and creating the database from which the search engine pulled results. In principle, this avoids SEO spam because the manual curators aren't going to add spam sites to the index. Reddit functions similarly because, in principle, users aren't going to upvote spam posts.
Ah. But we're still using some algorithm that's doing curation to even look at reddit. I see you're point about reddit being manually curated via upvotes, interaction, and moderation, but there still has to be something in between to even find the content and match it with what we're looking for. Even reddit's too big to just be a page of links, a la Yahoo.

Reddit has avoided

Because they want a bias-filtered selection of the internet. It is very obvious looking at the reddit comments that most are made by bots and not humans, the difference is that most behave similarly and the upvoting favors sameness.
That’s not at all obvious to me. What subreddits do you visit? The ones I frequented had maybe 5% or less bot comments. I think mods generally banned the bots from those subreddits too.
It's less about Wikipedia itself and more about its citations. Is Wikipedia sufficient to find nearly any important[0] information if you follow hyperlinks, or if you also look up physical references? What are the characteristics of the information that's missing? How long can you avoid Google for if you go along the chain of citations and links?

[0] objectively quantifying importance is an entirely different can of worms, but people know what's important to them

> Manually curated directories lost to search.

I rather suspect that they'll regain some popularity. Probably not general-purpose ones, but a variety of directories each covering a specific kind of site.

> Anything completely manual isn't going to be "web-scale".

True, but even so, it might still be an improvement. Especially if it's hundreds of different directories, each with their own tight focus.

Lists/"awesome lists" and lists of lists are great.

https://github.com/jnv/lists

Manual curation, the way libraries work. (Or at least, good libraries) And search functions within those curated datasets.
Who is going to be the neutral arbitrator who assigns Dewey Decimal numbers or Library of Congress codes to web pages? Can’t trust the page authors that’s for certain.
No one. Remember the big list of cool blogs that were generated off HN a couple weeks ago? That's one curated dataset. People can start their own curated databases and then see if others find them useful. Big ones become well-known, like big libraries. Small ones might become known for having the best resources for niche subjects.
Alphabetic? AAAPlumbing (btw my wife gets tons of business because her name starts with an A, and the insurance website is sorted by alphabet)

By date?

> What's the alternative?

Apple blocks all tracking in Safari by default and builds search engine without ads for iCloud+ subscribers.

Google was absolutely incredible when it first showed up. Now it is not great. When I want information all it gives me is shopping or listicles one step removed from shopping.
I've had good luck promoting my game on social media but when someone hears the name and Google's it they'll get results like pic related where someone has embedded an ad-wrapped knock-off of the game which contains more adverts. https://imgur.com/a/mz07uyR

Personally I can't find much on Google if I'm actually searching for something I don't know about. Finding the local mechanic is fine but querying a programming question, health problem or recipe is absolutely fucked. I sold all my Google stock recently but should have done so sooner. The value they have is in gmail and gsuite accounts but I can see the search business is circling the drain.

Conspiracy theory: they love showing shitty sites because those sites display adverts making Google money.

Google will eat at it further with stuff like this from the article:

"Google knows this, so it adds a “Question and Answers” block before the first SERP result, probably diverting a tremendous amount of traffic away from that page."

They scrape answers from sites that they used to send traffic to and publish it themselves. In many cases, though, those sites won't exist without the incentive of organic search traffic.

I get why Google does it, but it does create a feedback loop where less of that info will be out there to scrape.