Hacker News new | ask | show | jobs
by firemanphil 4110 days ago
Hi, I'm a developer on this project so I can answer any technical questions you may have.
4 comments

Hey this looks sweet. I'm still new to this field, but I'm looking to do some collaborative filtering across large datasets (4 billion rows+)

I was looking at using Prediction.IO and this looks suitable too, can you elaborate on the high level differences between this project and prediction.io?

I'm also curious about horizontal scaling.

The main reason why Seldon is different from other open-source prediction engines: * Seldon have come from a background of high-scale enterprise deployments before releasing an open source platform, not the other way around. In other words, we already optimized for real-time low latency high throughput production environments. * Seldon provides an end-to-end componentized setup - i.e. front end UI, real-time prediction server, offline machine learning jobs, web-scraping, etc. * Seldon allows developers to run A/B tests and change algorithms with no downtime. * Seldon will provide enterprise features like automated performance optimization of customer's own algorithms using micro-services.
Presently most of our CF algorithms utilize Apache Spark but we intend to be agnostic on this and allow any machine learning platform to be integrated. I believe that Spark can easily handle this size of data set.

With regards to horizontal scaling, there are two parts to consider. Creating the models and serving the recommendations (the Seldon server project).

Model creation is done in a variety of ways, but but can be managed with scalable technologies such as Spark.

The Seldon Server project can be deployed on as many machines are you require and they will work together to provide recommendations behind a load balancer. We have experience working with some very large news websites so this part of our technology is well developed.

Great thanks for your feedback, I will setup an installation in my home lab this weekend, super excited!
I maintain a small search engine that allows users to search for any file in my organization's servers. I'm primarily looking for a tool that will help me model user's search queries and user characteristics. Specifically clustering their queries into groups like searching for Documents, movies, or music, as well as what the client's OS is. Can Seldon do this easily for me, or is not the right program for the job, or even overkill?
It sounds like you basically want a form of document clustering. Seldon has integrated the Semantic Vectors https://code.google.com/p/semanticvectors/ project which can be used to do this or depending on your technical level to look directly at Semantic Vectors or toolkits such as gensim https://radimrehurek.com/gensim/
I've not heard of Seldon before. I'd love to hear your 'elevator pitch' for it?
Seldon is an open-source predictive machine learning platform that includes a high-performance recommendation engine and data enrichment. It can run multiple algorithms and configurations to optimise KPIs. It will shortly include a pluggable architecture to allow data scientists to deploy their custom algorithms.
Explain to me like you are talking to a 5 year old. What can I do with this?
He offered to answer technical questions, so let me step in on this one: it's a recommendation engine. E.g., it can do things like Amazon's "you may also like" feature.

Try this:

http://docs.seldon.io/index.html