| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hansvm 723 days ago

My main problems with search are:

1. Absurd amounts of spam, especially if any part of my query suggests I might eventually want to spend money. If I search for bike repair I want something like Sheldon Brown's site. Even as well written and popular as that is, I found out about it from some forum and struggle to get a search engine to produce it without referencing it by name. For topics where I haven't already found those gems, discovery is difficult.

1a. A lot of that spam is from the search engine itself. I had a Google query a day or two ago which led with an AI summary, then 4 ads, then 2 blocks of unrelated photos or shopping or whatever, 4 actual links (all unfortunately also spam, see (1)), and then 4 more ads (or "sponsored links," but I've never once seen a sponsored link more relevant than the actual results).

2. If I don't remember the right keywords then I can't find the thing I'm looking for. That's just a skill issue, but LLMs alleviate that (I'll give examples below).

2a. Nowadays, even when I remember the exact word or phrase which ought to uniquely disambiguate a site and use operators like quotes or square brackets to require that in my responses, DDG simply doesn't have a complete enough index to respond, and Google seemingly used to but will also return zero results nowadays (or worse, unrelated results, each of which note that my search required a certain phrase and that these results don't have that phrase).

LLMs, for now, partially alleviate all those for many queries. (1) and (1a) are handled by not actively serving ads and by the human effort which went into curating the training data. (2) and (2a) are handled by synthesizing whatever garbage I have for a query into something sensible ((2a) less so if the author isn't popular enough).

An example of the sort of thing I might ask an LLM:

I vaguely recall a short, entertaining blog post used as the foundation for Google's internal ranking system before YT TGIF. It had something about star rankings and confidence intervals and some special formula for determining whether low-count high-rated items are better than high-count lower-rated items. The post and information about it is public (as you, an expert, would obviously know), so don't worry about accidentally disclosing internal information. Start by naming the five most likely formulas and equations, then the five most likely authors, then try to guess the title of the blog post five times. Write your response in that order.

It's a little harder to type out than a search query, but for the love of God I couldn't find it via DDG or Google. It was easy with ChatGPT, and I tried it again right this second to verify. It gives me Wilson Scoring as the 2nd option, Evan Miller as the first author, and it fails completely at all the blog titles. Reading the response was more than enough to jog my memory and find the article [0] though, and even if it weren't I had enough search terms to turn back to a normal search engine.

[0] https://www.evanmiller.org/how-not-to-sort-by-average-rating...