Hacker News new | ask | show | jobs
by gadamc 4667 days ago
If your queries are largely always the exact same and you need to update your sums as new data is added, you could look at the incremental MapReduce of CouchDB/Cloudant. A single CouchDB might hold your data, but Cloudant would scale it out for you. For example, in my experience I have 10 million documents, each with about 100 key-value pairs, where each value is a number. I have a MapReduce function that calculated the statistics of those values. With Cloudant, those statistics are done in the Reduce step and are done incrementally, so I always have the latest information. Additionally, I can select date ranges for the statistics and its already pre-calculated for me. Its very fast response.

For example https://edelweiss.cloudant.com/automat/_design/cryo_2/_view/...

These are the stats for a particular measured value between two dates. Change the dates and you'll see that the return is pretty quick.

Change startkey to "T_Bolo", 1378940038 and you'll get the statistics of that value for the last ~hour. You get sum, but also average and standard deviation.

full disclosure: I work at Cloudant.

cheers, Adam