| Out of curiosity, did you compare to any other baselines? I suspect you did a lot better than you think you did, because that particular baseline is actually very misleading for ranking/recommendation tasks (this is a common source of confusion for newcomers). Here's why, in two parts: 1) Say you estimate (as you propose) that a user will always give their average rating. This might get you good-ish error and ROC as a prediction task, but will give zero recommendation value because the prediction for a given user will be constant for all possible recommendations. 2) Say you estimate that a user will give the average score that the item has received across all users. Again, possibly good-ish in terms of prediction ROC and RMS error, but this offers no personalization (all users get the same predictions, i.e. you're basically just showing the default Reddit ranking). Both of these baselines are vastly inferior to even really stupid models like "how many times have I upvoted stories from this submitter" in terms of recommendation value, but the latter is (if I recall from my own experiments) much worse when evaluated on the basis of overall ROC. I would strongly suspect that a correctly implemented NB or S1 would vastly outperform either of the two baselines in terms of actual recommendation utility (even though when you look at the baseline's ability to predict actual numbers, they might be comparably good in an RMS sense). The moral of the story: one must be very careful when trying to quantify the performance of learning systems; actual utility is often difficult to evaluate merely by looking at standard statistical measures of accuracy. |