Hacker News new | ask | show | jobs
by tmikaeld 1524 days ago
But we're talking petabytes of text comments and an index of that would be a lot larger. How do you access that data fast enough to enable search?
1 comments

Succinct full text indexes can be substantially smaller than the source text. It depends on the zero order entropy of the text. If things are highly repetitive, a very small index might be feasible. Usually lookup times are linearly proportional to query size, with logarithmic factors in database size.
I've yet to see such a system (in production) except for Sonic, but sonic doesn't allow for full-text search only search on a key-by-key basis.