|
|
|
|
|
by tbalsam
1047 days ago
|
|
I think you may be missing the extensive lines of research covering those topics. Memorization vs Generalization has been a debate before LW even existed in the public eye, and inputs that networks have unusual sensitivity to have been well studied as well (re:chaotic vs linear regimes in neural networks). Especially the memorization vs generalization bit -- that has been around for...decades. It's considered a fundamental part of the field, and has had a ton of research dedicated to it. I don't know much either way about RLHF in terms of its direct lineage, but I highly doubt that is actually what happened, since DeepMind is actually responsible for the bulk of the historical research supporting those methods. It's possible ala the broken clock hypothesis + LessWrong is obviously not the "primate at a typewriter" situation, so there's a chance of some people scoring meaningful contributions, but the signal to noise ratio is awful. I want to get something out of some of the posts I've tried to read there, but there are so many bad takes written with more bombastic language that it's really quite hard indeed. Right now, it's an active detractor to the field because it pulls attention away from things that are much more deserving of energy and time. I honestly wish the vibe was back to people even just making variations of Char-RNN repos based on Karpathy's blog posts. That was a much more innocent time. |
|
I meant this specific analysis, that neural networks that are over-parameterized will at first memorize but, if they keep training on the same dataset with weight decay, will eventually generalize.
Then again, maybe there have been analyses done on this subject I wasn't aware of.