|
|
|
|
|
by dacort
5528 days ago
|
|
While I understand that real-time full-text indexing is a much more difficult problem to solve, I've got just under 1.5 billion tweets "indexed" in TweetStats. And I'm one person. Granted, given the 30MM/day number they must be growing that index very quickly and they've likely hit that 1.5 mark pretty darn quickly. |
|
Solve?
Greplin has probably not built their own search technology. I'd guess they're simply running Lucene or Sphinx like everyone else.
Their index is still small by search standards, as you can tell from TechCrunch having to reach 10 years back to make an "impressive" analogy.
Today, 1.5 billion documents translates to a couple terabytes of data (probably high single digit). 30 million indexed/day translates to about ~400/sec. You could store and process all that on a single, beefy box. Or you can spread it out over a couple amazon instances.
But yes, in 2001 this would have been impressive. In 2001 you'd pay $150 for a 40 GB harddrive...