Hacker News new | ask | show | jobs
by derefr 1553 days ago
> It cannot be so difficult for them to keep things like literal search, can it?

Greater scale = greater cost of keeping data hot in their search data-warehouses (esp. in light of contention over memory/caches.) Keeping around both a source-text string and its tsvector representation (or whatever Google's version of that is) is a "thing that doesn't scale" that they could provide at 1B queries/day, but probably not at 10B queries/day.

> the algorithm that has encouraged stupid amounts of articles of a certain length. Recipe for baked potatoes is now 2000 words long.

That's not the algorithm's fault per se; that's instead the fact that recipes can't be copyrighted, and so these sites can freely steal + repost one-another's recipes, and so you'll find the same recipe word-for-word on many sites, thus making an exact match in the recipe part not contribute highly to ranking any particular site. The 2000-word blog post, on the other hand, is actual Intellectual Property unique to the site posting it. So it only appears in the one place; and so when your query matches it, it ranks quite highly indeed.

2 comments

> That's not the algorithm's fault per se;

Yes, it is. There are good recipe sites out there with authoritative, reliable content and fast loading times. Google says it prioritizes those things, I can identify sites that have them, and yet the algorithm doesn't favour them. That's the algorithm's fault no matter what memes about copyright law cause a proliferation of shitty websites.

What I'm saying is that the "recipe" part of a recipe website is a commodity – there is no "authoritative" source for a given recipe, unless that recipe is too niche in appeal to end up widely disseminated. This video (https://www.youtube.com/watch?v=SsNLzyqqINw) has a pretty good coverage of the topic.

Compare and contrast: phone-number directory listings. Who should Google cite as the authoritative source for lists of name-to-phone number associations? Nobody. All the lists are copying from each-other, curating and correcting the data taken from one-another, gathering their own original data for additions, and everything in between. Every portal overlaps every other portal, but mostly has the same stuff.

Compare and contrast, in the physical world: printings of public-domain literature. If Google indexed bookstores, which printing by which publisher would you want them to rank first on a search for e.g. Pride and Prejudice?

Try Kagi.com, you can rank domains however you want
What I really want is biased search results of my choosing.

$10 a month for a personal search is a bit much. $10 a month for work related search is cheap. Give me results specific to my industry without having a super long query.

That's what Kagi lenses are for. Just try...
(Neeva team member here) re: recipes. You might like the Neeva recipe search experience. You can see an entire recipe and reviews (without the ads or intro text) without navigating away from the search results page. Quick example here: https://neeva.com/search?q=baked+potato&src=nvobar