| HN Mirror

I had the same problem. I really wanted to like this project, but I found it hard to get a handle on what their goal is. Performant, c++17, open source, IR tools sounds great to me. But.. how does someone use this? Are they a parallel to Lucene? ElasticSearch? Grep?

Talking about scale would help a lot here. What's the largest dataset they've indexed? Do they shard across multiple nodes? etc.

I found they had a research paper about PISA at OSIRRC (a replicability challenge for ir?) last year with some details. You can get the paper and slides off the conference site:

https://osirrc.github.io/osirrc2019/

They have run the dataset on things like ClueWeb12 (1.5TB web), but the paper was about replicable search and lacked performance comparisons to other systems. It's hard to call yourself performant unless you show you're at least as good as other implementations.