|
|
|
|
|
by stereosteve
2178 days ago
|
|
This is excellent. I was recently reviewing Lucene concepts and found this video really good:
https://www.youtube.com/watch?v=T5RmMNDR5XI Also this site has a series of Lucene articles that are pretty nice. The one on Term Vectors in particular:
http://makble.com/what-is-term-vector-in-lucene Based on some quick research it seems like Lucene is already using a sorted skip data structure for the posting list, so I wonder why they had to do a custom implementation? Perhaps it has to do with their custom Document ID scheme and how they want to preserve order in the Posting List being different from the default behavior. It also sounds like searchers are searching on indexes as they're being written, and there is some custom coordination around visibility, which might require diverging from Lucene default behavior. Either way, pretty impressive! |
|
You can specify such an index level default sort (similar to what they use custom IDs to achieve) and it will use skip lists to make searching with that sort faster. It will impose an indexing overhead but I would guess for usecases like this it could make sense.
https://www.elastic.co/blog/index-sorting-elasticsearch-6-0