Hacker News new | ask | show | jobs
by sateesh 1629 days ago
Any recommendation of how to get started with recommendation engine and anomaly detection ? Should one grok all the math pre-reqs before starting with them, or can pick the math as we go along ? Appreciate your inputs.
1 comments

I am not an expert or industry practitioner of either recommendation engines or anomaly detection, just I meant simple and useful things to add to a website.

Suppose your website has posts and you want to flag posts when they have abnormally high likes because they might be great reading or complimenting your new release. You could collect a dataset of likes after a day, X, of each post. Then calculate mean and variance, fit a normal distribution[1]. Then calculate z such that P(X >= z) = 0.01 (1%). z represents the cut off point at which typically only 1% of posts are above. Then when a post is above z say 1000 likes then you see what all the fuss is about.

I am just talking about applying 16-18 school maths in a simple way, to point out unlikely events. Of course the distribution of likes may not look like a normal curve if you plot (number of posts with x likes against x) so a different distribution may make more sense. It may not be a perfect model but just a quick and dirty thing to try, :).

Personally I enjoyed completing the free Andrew Ng Machine Learning course[2] on Coursera which covers this and quickly training a simple recommendation engine for movies. It also covers multi-variate Gaussian distributions if you want to flag based on more than one criteria. For this course, the maths is relatively accessible and they go over what you may have forgotten so you can pick up maths as you go along.

Of course you can go far more complex if you like but I don't know much about that.

[1] Normal distribution https://en.wikipedia.org/wiki/Normal_distribution

[2] https://www.coursera.org/learn/machine-learning