Hacker News new | ask | show | jobs
by marginalia_nu 1636 days ago
I run three separate indices at about 10-20mn documents each. But I'm fairly far off any sort of limit (ram and disk-wise I'm at maybe 40%).

I'm confident 100mn is doable with the current code, maybe .5bn if I did some additional space optimization. There are some low hanging fruit that seem very promising. Sorted integers are highly compressable, and right now I'm not doing that at all.

1 comments

Yes doclist compression is a must. Higher intersection throughput and less bandwidth stress. Are you loading your doclists from persistent storage? What is your current max rps?
I'm loading the data off a memory mapped SSD, trivial questions will probably be answered entirely from memory, although the disk-read performance doesn't seem terrible either.

> What is your current max rps?

It depends on the complexity of the request, and repeated retrievals are cached, so I'm not even sure there is a good answer to this.