Hacker News new | ask | show | jobs
by firemanphil 4110 days ago
Presently most of our CF algorithms utilize Apache Spark but we intend to be agnostic on this and allow any machine learning platform to be integrated. I believe that Spark can easily handle this size of data set.

With regards to horizontal scaling, there are two parts to consider. Creating the models and serving the recommendations (the Seldon server project).

Model creation is done in a variety of ways, but but can be managed with scalable technologies such as Spark.

The Seldon Server project can be deployed on as many machines are you require and they will work together to provide recommendations behind a load balancer. We have experience working with some very large news websites so this part of our technology is well developed.

1 comments

Great thanks for your feedback, I will setup an installation in my home lab this weekend, super excited!