|
|
|
|
|
by colah3
1264 days ago
|
|
I'm glad you've found it easy to follow! My best guess at the middle regime is that there are _empirical correlations between features_ due to the limited data. That is, even though the features are independent, there's some dataset size where by happenstance some features will start to look correlated, not just in the sense of a single feature, but something a bit more general. So then the model can represent something like a "principal component". But it's all an illusion due to the limited data and so it leads to terrible generalization! This isn't something I've dug into. The main reason I suspect it is that if you look at the start of the generalizing regime, you'll see that each feature has a few small features slightly embedded in the same direction as it. These seem to be features with slight empirical correlations. So that's suggestive about the transition regime. But this is all speculation -- there's lots we don't yet understand! |
|