Hacker News new | ask | show | jobs
by kds 5159 days ago
I felt a similar kind of skepticism when I saw it took ~3 years to improve the Netflix recommendation system with just ~10% - in the context of the Netflix Prize, with great minds (data scientists and practitioners) participating and collaborating.

Maybe the initial system was quite good and it had no space for easy-and-fast enhancements, I don't know.

But 10% overall improvement result in 3 years (just as quantitative ratio, esp. if it translates directly to the same growth pattern in financial revenues) is something that makes the business types yawning.

3 comments

There's a fantastic paper, Hand 2006, which notes the strong tendency for simple models to get nearly all of the performance possible out of solvable problems.

Hard problems do better with complex algorithms but there's also just less to be gained.

The best solution tends to be simple models applied to the right kind of data such that the problem has become easy. This is sometimes pretty difficult though since the simple models are designed on simple data, which might not always be what you've got.

But what if 10% improvement means 10 M$/year ?

Anyway I think there are many applications where getting the absolute best performance isn't as important as finding the problem, figuring out how to apply a machine learning model to it (which includes getting the necessary training data) then training an off the shelf mode. The later of these may take a day or less, the other phases may well require both more thought and more effort.

10% increase in 3 years translates to around. 3.3% yearly growth rate. So in your example the 3.3% increase would be that 10M$ => so 1% of your annual business revenue is 10/3.3 or just above 3M$.

But that means you already have a really significant business that makes ~300 M$ per year. And you manage to increase it just by peanuts (relatively speaking).

And there is inflation in economy, and the alternative costs of not investing such a huge sum or part of in Apple stocks (for example) during those years.

My point explained better:

The startup success of getting from zero to millions just because of clever ML/data-science/statistics is something to be respected and admired. But for already big-business all this big-data buzz might provide just minor enhancement opportunities at best.

Of course all these numbers are hypothetical, I have no idea what the actual Netflix numbers are, but aren't you assuming 100% profit margin ?

If you actually have a machine learning application that increases annual revenue by 10% and your initial annual revenue is $300M (like in the example) and your profit margin is 50% then (neglecting the $1M cost of the model because it's small and amortized over many years) your annual profits go from $150M to $180M which is a 20% increase. I don't think that is a number to yawn about.

On your last point I actually think the opposite is true. The larger a company's operations the more potential cost savings there are. If profit margins are slim, as they are in many industries, the effect on earnings of relatively small cost savings or revenue increases can be large indeed.

Well, the widely-known real example I mention is the Netflix Prize result -> 3.3% CAGR (cumulative annual growth rate, the 10% is for the whole period of 3 years, not for 1 year).

The numbers you mention are the hypothetical ones. Go and find a real publicly documented example that is close as values and margins to what you describe and I might agree.

NB. #1) Google and web-search, or some similar startup success story, as example doesn't count - they're the 2-person startup gone wildly successful, not a previously big entity that hired 2 ML-geniuses to open their eyes.

NB. #2) If I could bring a revolutionary increased value to a company - through vastly enhanced data processing and analysis - rather than doing a consultancy and educate somebody I'd rather enter the industry as competitor and prove the "old" guys don't understand the business anymore.

Last month Forbes reported that Netflix said 75% of what it's customers watch are from recommendations. Definitely some bang there.

Also, note that 10% improvement was far from linear: year 1: +8.43%, year2 +1% and year3 +0.6% (!!).

An interesting observation, indeed - so the enhancement opportunities were effectively explored within the first year or so. The next two years count for far less, though I guess the cleverest approaches started to emerge just then.