Hacker News new | ask | show | jobs
by rocqua 1811 days ago
I used the same dataset for a small project in my CS master. It was a really fun challenge, and it taught me a lot.

Most notably, it taught me that it was incredibly hard to make significant progress past the most simplest and naive approach. That approach was "Take average rating a user gives, take the average rating a movie gets, multiply". (Ratings normalized to be between 0 and 1).

Just using this method would give us 95% of the accuracy of our final method. I think I calculated, and compared to the prize winning result, our method got ~90% as accurate a result.

1 comments

This is an important point about a lot of sophisticated models; you're really fighting for a few percent improvement over simple approaches. Sometimes a basic linear regression will get you 70% there, while a trained neural net will bring that up to... 75%.

A few percent can make a difference, especially in competitive areas; but the biggest win is just getting something in where there was nothing before. It's a bit like optimizing code.