| HN Mirror

real-time full-text indexing much more difficult problem to solve

Solve?

Greplin has probably not built their own search technology. I'd guess they're simply running Lucene or Sphinx like everyone else.

Their index is still small by search standards, as you can tell from TechCrunch having to reach 10 years back to make an "impressive" analogy.

Today, 1.5 billion documents translates to a couple terabytes of data (probably high single digit). 30 million indexed/day translates to about ~400/sec. You could store and process all that on a single, beefy box. Or you can spread it out over a couple amazon instances.

But yes, in 2001 this would have been impressive. In 2001 you'd pay $150 for a 40 GB harddrive...