|
The problem is generally called "deep pagination". It's extremely inefficient to compute. Specifically, counting requires very low memory. When data is spread across 10,000 computers, all of them counting returns just 10,000 numbers i.e. 4 bytes * 10,000 = 40KB. It's easy for 1 computer to count those 10,000. Even at 100,000 computers 400KB. Merging sorted search results is extremely memory intensive. Even with just the Id+Score pair, let's say 8 bytes. To get the 10,000th search result, each computer needs to create a List of 10,000 results, thats 10,000 * 10,000 * 8 bytes = 800 MB. For the 100,000th search result 10,000 * 100,000 * 8 bytes = 8 GB. OR if your data grows to 100,000 computers, thats 100,000 * 100,000 * 8 bytes = 80 GB of intermediate results to process at the end. As you can see this doesn't scale well. You're required to retain context (i.e. sessions) of the search in memory instead, and get the search engine to better coordinate across all 100,000 computers. This also has scaling limitations based on memory of the session, the number of computers, the number of sessions, and their TTL (someone can leave the search page open for day and hit "next page" - should the sessions still be open? Thats an answer each search engine has to decide). The reality is, if a customer wants deep pagination, they are better suited to a full data dump (i.e. full table scan) or using an async search API, rather than a sync search API. |