| > In particular, any hypothesis that tends towards magick, for example suggesting that a change in quantities (data, compute, training time) yields qualitative improvements (prediction transmogrifying into understanding, overfitting transforming into generalisation), should be discarded with extreme prejudice. It does not tend towards magic. It does happen and people can replicate it. Melanie Mitchell recently brought back the point by Drew McDermott that AI people tend to use wishful mnemonic. Words like understanding or generalisation can easily be just a wishful mnemonic. I fully agree with that. But the fact remains. A model that has ~100% training accuracy and ~0% validation accuracy on simple but non-trivial dataset is able to reach ~100% training and ~100% validation accuracy. > This hints to the fact that the effect depends on the structure of the dataset and so it's unlikely to, well, generalise to data that cannot be strictly controlled. Indeed, but it is still interesting. It may be that it manifests itself because there is a very simple rule underlying the dataset and the dataset is finite. But it also seems to work under some degree of noise and that's encouraging. For example, the fact that it may help study connection of wide, flat local-minima and generalization is encouraging. > You are impressed by the fact that one particular, counter-intuitive result was obtained I'm impressed by double descent phenomenon as well. And this one shows up all over the place. > There is a well-known paper by John Ioannidis on cognitive biases in medical research: Why most published research findings are false I know about John Ioannidis. I was writing and thinking a lot about replication crisis in science in general. BTW - it's quite a pity that Ioannidis himself started selecting data towards his thesis with regard to COVID-19. > It's not about machine learning per sé, but its observations can be applied to any field where empirical studies are common, like machine learning. Unfortunately, it applies to theoretical findings too. For example, universal approximation theorem, no free lunch theorem or incompleteness theorems, are widely misunderstood. There are also countless less known theoretical results that are similarly misunderstood. |
I confess that I'd be less suspicious if it reached less than full accuracy on the validation set. 100% accuracy on anything is a big red flag and there's a little leprechaun holding it and jumping up and down pointing at something. I'm about 80% confident that this "grokking" stuff will turn out to be an artifact of the dataset, or the architecture, or some elaborate self-deception of the researchers by some nasty cognitive bias.
Perhaps one reason I'm not terribly surprised by all this is that uncertainties about convergence are common in neural nets. See early stopping as a regularisation procedure, and also, yes, double descent. If we could predict when and how a neural net should converge, neural networks research would be a more scientific field and less a let's-throw-stuff-at-the-wall-and-see-what-sticks kind of field.
But, who knows. I may be wrong. It's OK to be wrong, even mostly wrong, as long as you 're wrong for the right reasons. Science gives us the tools to know when we're wrong, nothing more. The scientist must make peace with that. Thinking one can be always right is hubris.
Speaking of which, John Ioannidis is one of my personal heroes of science (sounds like an action figure line, right? The Heroes of Science!! dun-dun-duuunnn). I was a bit shocked that he came out so strongly sceptical against the mainstream concerns about Covid-19, and I've heard him make some predictions that soon proved to be false, like the number of people who would get Covid-19 in the USA (I think he said something like 20,000 people?). He really seemed to think that it was just another flu. Which btw kills lots of people and we're just used to it, so perhaps that's what he had in mind. But, I have the privilege of sharing my maternal language with Ioannidis (he's Greek, like me) and so I've been able to listen to him speak in Greek news channels, as well as in English-speaking ones, and he remains a true scientist, prepared to express his knowledgeable opinion, as is his responsibility, even if it may be controversial, or just plain wrong. In the end, he's an infectious disease expert and even his contrarian views lack that certain spark of madness in the eye of most others who share his opinions. I mean, because he's speaking with knowledge, rather than just expressing some random view he's fond of. He's still a role model for me. Even if he was wrong in this case.
>> Unfortunately, it applies to theoretical findings too. For example, universal approximation theorem, no free lunch theorem or incompleteness theorems, are widely misunderstood. There are also countless less known theoretical results that are similarly misunderstood.
I guess? Do you have some example you want to share? For my part, I try to avoid talking of things I don't work with on a daily basis, on the internet. I know what I know. I don't need to know -or have an opinion- on everything...