I still don’t get why. It cannot be so difficult for them to keep things like literal search, can it? What is the incentive to remove it and replace it with a needlessly more complex almost literal but still fuzzy search?
I do suspect the main thing people complain about currently with Google is the abundance of ads and the algorithm that has encouraged stupid amounts of articles of a certain length. Recipe for baked potatoes is now 2000 words long.
> It cannot be so difficult for them to keep things like literal search, can it?
Greater scale = greater cost of keeping data hot in their search data-warehouses (esp. in light of contention over memory/caches.) Keeping around both a source-text string and its tsvector representation (or whatever Google's version of that is) is a "thing that doesn't scale" that they could provide at 1B queries/day, but probably not at 10B queries/day.
> the algorithm that has encouraged stupid amounts of articles of a certain length. Recipe for baked potatoes is now 2000 words long.
That's not the algorithm's fault per se; that's instead the fact that recipes can't be copyrighted, and so these sites can freely steal + repost one-another's recipes, and so you'll find the same recipe word-for-word on many sites, thus making an exact match in the recipe part not contribute highly to ranking any particular site. The 2000-word blog post, on the other hand, is actual Intellectual Property unique to the site posting it. So it only appears in the one place; and so when your query matches it, it ranks quite highly indeed.
Yes, it is. There are good recipe sites out there with authoritative, reliable content and fast loading times. Google says it prioritizes those things, I can identify sites that have them, and yet the algorithm doesn't favour them. That's the algorithm's fault no matter what memes about copyright law cause a proliferation of shitty websites.
What I'm saying is that the "recipe" part of a recipe website is a commodity – there is no "authoritative" source for a given recipe, unless that recipe is too niche in appeal to end up widely disseminated. This video (https://www.youtube.com/watch?v=SsNLzyqqINw) has a pretty good coverage of the topic.
Compare and contrast: phone-number directory listings. Who should Google cite as the authoritative source for lists of name-to-phone number associations? Nobody. All the lists are copying from each-other, curating and correcting the data taken from one-another, gathering their own original data for additions, and everything in between. Every portal overlaps every other portal, but mostly has the same stuff.
Compare and contrast, in the physical world: printings of public-domain literature. If Google indexed bookstores, which printing by which publisher would you want them to rank first on a search for e.g. Pride and Prejudice?
What I really want is biased search results of my choosing.
$10 a month for a personal search is a bit much. $10 a month for work related search is cheap. Give me results specific to my industry without having a super long query.
(Neeva team member here) re: recipes. You might like the Neeva recipe search experience. You can see an entire recipe and reviews (without the ads or intro text) without navigating away from the search results page. Quick example here: https://neeva.com/search?q=baked+potato&src=nvobar
The last time this came up, Google demonstrated that it still worked. Most of the examples of it not working people tried to provide are actually just unexpected exact matches in the HTML that the standard user doesnt see, so they seem like false positives or "surprisingly good" results not based on the page content.
> What is the incentive to remove it and replace it with a needlessly more complex almost literal but still fuzzy search?
Control. They've moved from helping you find what you asked for, to trying to influence you to changingnwhat you ask for to the thing that paid them the most.
Similarly they're they're forcing creators to alter content to match their metrics or fall into obscurity.
Eh, it's not so much that it stopped working as it is that it never worked the way you thought it did.
Quotes have ~always been an exact match on the tokenized query text, not a substring match on the corpus text. No synonyms, reordering, gaps, etc, but the matches -- and failures -- are sometimes not obvious at first blush.
If you search for "don't stop me now", for instance, that "don't" tokenizes to "don t", so it will match the tokenized strings "don't", "don t", "don-t", "don, t", etc ... but not "dont", because that's outside tokenization.
On the other hand, snippets mostly are substring matches of the query text, so if you see a result to a literal query that doesn't have a snippet, you know it's probably one of the weird matches.
This is just patently false in addition to being condescending.
If you use quotes around a phrase, it will reorder terms and make substitutions with synonyms in addition to straight up ignoring the quoted phrase no katter how many times you add +. If you then fiddle with settings (randomly not available depending on star alignment and device) to change it to 'verbatim' it will still reorder and split up tokens in the phrase.
In the past that Google search returned no results for any search. Today the set of results are altered before they are presented to the user. Sometimes the set of reslts is empty and at other times it contains results.
For example:
steve -steve returns 0 results
test -test returns 4,780,000,000 in my search
starting with google/youtube videos
I mean, I'm confident that the core of how it works -- "an exact match between the tokenized document and the tokenized query" -- hasn't changed in a very long time, but I can't really promise there wasn't another aspect I'm ignorant of that is responsible for the behavior you remember that changed somehow.
"Exact tokenized matching" can look like "exact string matching" a lot of the time. Until you hit some of the edge cases it's like kerning: https://xkcd.com/1015/
I do suspect the main thing people complain about currently with Google is the abundance of ads and the algorithm that has encouraged stupid amounts of articles of a certain length. Recipe for baked potatoes is now 2000 words long.