| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mywittyname 2013 days ago

> What is the reason behind us not having deployed massive amounts of search indices into the world that were built using this method?

There are a lot of other strategies for segmented data indexes. Many of which are roughly equivalent to this in practice. For example, using a computed strategy for distributing data among various nodes on a server. In such a situation, a master node can run a computation on an indexed value to identify the node(s) in the cluster which contain the data and send a query request to only the nodes containing that information.

I would think the biggest drawback to this particular strategy is, often times, you need to analyze the data in the index to get an answer anyway. So it's not really beneficial to throw away the index data. For example, if the indexed field is a timestamp, and the result set is often sorted on that field, then it makes total sense to keep the index values and use that to perform sorting while a separate thread fetches the data needed to flesh out the query.