Hacker News new | ask | show | jobs
Is Google lying about total number of results? (useloom.com)
6 points by zabi_rauf 2696 days ago
2 comments

Yes. Often they initially claim that there is many thousands of results but when you get to page 5, it now claims that there is only 47 results (for example).
Do you also consider it a lie when postgresql's planner starts a query by assuming that it'll deliver thousands of rows, but once it's run the query to completion there are only 47?

Postgresql does that because the initial number is based on samples and statistics and heuristics, and is produced by the query planner as a side effect of making the query execution plan. I imagine Google, too, has a query planner.

Right, and it'd be pretty crazy to trawl through all results just to come up with an accurate count; would be a huge waste of energy (even leaving costs aside). But I do think their predictions have gotten a lot worse in the last few years. I pretty often have searches where the first page of results indicates there's millions of records, and the second page basically has one or two; and that's a bad experience.
I agree with you but from my experience, the total number of results if almost definitely inflated... either that or it is completely meaningless. (ie: about the only time that the result count is not inflated is when there are 0 results and I have never seen them underestimate)
There's a good reason to expect that it usually will be an oversestimate. Two reasons actually.

One, google has a strong incentive to avoid underestimates: Underestimates have a fairly high risk of causing RAM/CPU problems since the actual work that must be needed is higher than estimated, and that's an evil ugly problem when processing input from potentially malevolent anonymous users on the net.

Two, the simplest algorithm to compute the estimate is one that'll overestimate if you choose a good combination of search terms, such as two words that hardly ever occur together. The planner's statistics will know how common each search term is, but may not know that this particular combination is very rare.

no, I don't think so. researches are following algorithms no matter what the word searched is but some results might be excluded form the search according to some specific settings.