Hacker News new | ask | show | jobs
by ot 4377 days ago
> training and running complex ML is not currently feasible on Streaming Frameworks we have today to use them for both realtime and batch.

Have you had a look at Samoa? It is a streaming machine learning library for Storm and S4.

http://yahoo.github.io/samoa/

1 comments

How different is it from mllib ( https://spark.apache.org/mllib/ )

I understand that mllib is strictly for use as a batch data library.

I don't know much about mllib specifically, but I was expecting to come here and see more comments about Spark - as it does both batch and stream processing relatively well, which allows you to reuse a lot of code between the two pipelines. It seems the primary original motivation was to "beat the CAP theorem" by using different distributed systems that had different characteristics, so this would defeat the point, but like the author I don't think "beating the CAP theorem" this way is going to produce results that warrant the work.