|
|
|
|
|
by karterk
4724 days ago
|
|
Hard to offer suggestions without knowing rough size of data - depending on how much money you're willing to cough up, even 1 TB is in the range of "can fit in the memory" territory. Having said that, Spark is really great for running iterative algorithms and will definitely fit with what you have described. I suggest staying away from building it on your own using riak/redis (atleast until you have ruled out spark), as you will run into lots of operational issues like handling failures, resource allocation, retries etc. |
|
We frequently run different processing algorithms over the entire stored dataset (stored data doesn't change) and update the calculated features each time. Not sure if this helps narrows things down. Thanks