Hacker News new | ask | show | jobs
A viable replacement for rrd for storing timeseries data
1 points by anuj 5078 days ago
rrd is an awesome tool but it causes data loss due to averaging . I am looking out for something that is almost as efficient as RRD and causes no dataloss with time . I am fine with disk space usage
2 comments

We've played around with:

OpenTSDB: http://opentsdb.net/

StatsD: https://github.com/etsy/statsd/ (description here - http://codeascraft.etsy.com/2011/02/15/measure-anything-meas...)

OpenTSDB and StatsD seemed great for getting TS data in and producing nice dashboards, but they didn't seem to fit our needs for performing custom analytics on the data.

At the moment, we're leaning towards leveraging Cassandra based on our scalability requirements. Check out http://rubyscale.com/blog/2011/03/06/basic-time-series-with-... and http://www.datastax.com/dev/blog/advanced-time-series-with-c... to get an idea on how cassandra can help.

how you played around with mongodb
Actually, we started out using mongo to focus on the analysis instead of the schema, but we quickly ran into performance issues as the datasets grew. We were simply using mongoengine for Python, so we didn't spend a significant amount of time trying to optimize our schema or implement things like sharding.

Our performance issues with mongo largely stemmed from our poor use of indexes - we defined a lot of indexes because how we needed to query was a very organic and undefined process as we got new analysis requirements. Because we would have to frequently go back and compute new feature vectors across the whole (or large parts of) the dataset, we weren't able to implement a lot of the aggregation capabilities you'll see implemented in many other time series schemas.

i have seen this ppt and it looks inspiring http://www.slideshare.net/sky_jackson/time-series-data-stora...