| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by otoolep 3772 days ago

Thanks for sharing.

>In the streaming case, documents arrive at a very fast rate (e.g. average of 6000 per second in the case of Twitter) and with this kind of velocity and volume it is impractical to build the inverted document index in real-time.

Actually, it's completely practical. It's expensive, but it can be done. When I was at Loggly we built a big ES cluster, pumped 10,000s log messages into it per second, and served queries. Ingest to query was on the order of seconds - it still runs today. The key, of course, is not to build a single inverted index.

I don't want to underestimate the work, but it is practical. And expensive.

http://www.slideshare.net/AmazonWebServices/infrastructure-a...