I think it was roughly 300 million events/day (1kb per event). There is some overhead incurred by logstash (turning a log into json, parsing it into fields) and by elasticsearch (analyzing/indexing data).
In practical terms, and by way of example, a plain text apache access log, fully parsed by logstash (breaking out fields, etc), has historically bloated by quite a bit (6.2x I have measured). Lately, however, with improvements to logstash, better default settings, and elasticsearch being awesome, the 'inflation' number gets down to something more like 1.5x - which isn't bad considering all the awesome you get with it.
Long term, I am working towards making the 'raw data to stored data' ratio something less than 1x.
In practical terms, and by way of example, a plain text apache access log, fully parsed by logstash (breaking out fields, etc), has historically bloated by quite a bit (6.2x I have measured). Lately, however, with improvements to logstash, better default settings, and elasticsearch being awesome, the 'inflation' number gets down to something more like 1.5x - which isn't bad considering all the awesome you get with it.
Long term, I am working towards making the 'raw data to stored data' ratio something less than 1x.
You can see some experiments I did a year ago on this: https://github.com/jordansissel/experiments/blob/master/elas...
I will repeat these experiments after the next release of logstash, and I expect storage ratios to improve significantly.