|
|
|
|
|
by rubyn00bie
2291 days ago
|
|
Ugh, I hate it when readme's have zero useful information; and literally only a way to cite them as a source... this is a big problem with C++ libraries. I 100% think the author(s) deserve credit for their work, and I'm sure it's brilliant, but I'll never use your work if I don't know what it does. How does this compare to anything other than being fast? Why should I use it? Can't you include a few examples, use cases, or reasons why it exists? Is it in memory, distributed, on disk... when is it fast? Does it have a binary I can run? [clicks around documentation site] ... oh shit there is a binary of something, maybe multiple... how about showing those on the readme? Or maybe talk about what algorithms are actually implemented instead of just saying lots of them are. |
|
Talking about scale would help a lot here. What's the largest dataset they've indexed? Do they shard across multiple nodes? etc.
I found they had a research paper about PISA at OSIRRC (a replicability challenge for ir?) last year with some details. You can get the paper and slides off the conference site:
https://osirrc.github.io/osirrc2019/
They have run the dataset on things like ClueWeb12 (1.5TB web), but the paper was about replicable search and lacked performance comparisons to other systems. It's hard to call yourself performant unless you show you're at least as good as other implementations.