Hacker News new | ask | show | jobs
by bulldoa 2402 days ago
I am confused, if a new model beats randomly selected randomised model 100% of time for each experiment why does it matter if randomised model beats other randomised models? Are they only comparing against the subset of worst randomised models?
2 comments

I think he's saying something like the following:

1/ the team implemeted a naive baseline

2/ they implemeted a more sophisticated model that depended on some parameter p

3/ for 100 different values of p, they examined its performance, and picked the model with the best performance

Now they're not quite subject to the multiple comparisons problem there, since the models with different values of p aren't independent from one another. But they're not not suffering from it either. It mostly depends on the model. But it's a very easy mistake to make. I'd say many many academic papers make the same mistake.

Short answer: if you do it right, it doesn't matter.

Long answer: I have saying in statistics: "nature abhors two numbers: 0 and 100". In the real world, there is no 100%, you have a number of models and a (finite) number of trials/comparisons to whatever metric and then you have to then make a decision.

My point was that their "non randomised" models may in fact have the equivalent performance of a random model, and that if this was in fact the case, you would expect them to beat a randomised comparison roughly half the time. If you have repeated trials of multiple models, the odds of one consistently beating others (even if it's properties were essentially equivalent to a random model) in a small finite number of trials is much higher than most people realise. Essentially, they're flipping a large number of coins to determine their performance, and choosing the coins that consistently come up heads.

Another observation I'd make is that in the real world, random or averages are almost the most facetious thing to be comparing performance against. We aren't generally in a state ignorance or randomness, but you see this kind of metric all the time, even from "respected" sources. 2 if/then/else statements will generally outperform randomness universally in a huge number of fields/subject matter areas.

What's not interesting is that one can build a robot that beats/meets the average human at tennis (the average human probably is probably incapable of serving out a single game), but that one can build one that performs better than a relatively cheap implementation of our current state of knowledge of the game.

Moving from 2 if/then/else statements to an n parameter complicated model that requires training data and that no one understands and requires huge amounts of power and time to train is not only not progression, it's actually a regression on the current state of affairs. In almost all fields, random or average is the last thing you want to compare against.