| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dtunkelang 1066 days ago

In general, the two ways to compute counts are top-down, by making a separate query for each filter, or bottom-up, by scanning the results and aggregating the counts, like a group-by. Top-down is good for a small universe of values, but bottom-up tends to be the scalable approach. And, as has been pointed out, you can produce approximations by aggregating a sample of the results -- as long as it is a representative random sample. Just be mindful of statistics, particularly confidence intervals.

A related issue is that counts tend to treat all results as equal. If you retrieva a lot of results but most of them are not relevant -- as can happen with full-text search -- then the counts can be misleading. You may have the converse problem if your retrieval excludes a lot of relevant results. So, if you are implementing a faceted search application where you use and show counts, you should keep in mind that it will only work if your retrieval does a reasonable job of balancing precision and recall.

Finally, remember that supply != demand. The distribution of a facet in your index may be different from the distribution of that facet in searcher intent. A bit more on that here: https://dtunkelang.medium.com/search-intent-not-inventory-28...