|
|
|
|
|
by sdenton4
1728 days ago
|
|
>> making a gigantic lookup table for every single possible bit combination You don't need this to achieve zero loss on the training set, though: You only need a lookup table for the images in the train set. We know that neural networks can do something like this (learning the lookup table) because large networks can get to zero training loss on randomly assigned labels. (I linked the paper a bit further down in the thread.) This means there's some memorization capability in the architecture, even if it's a weird emulation of some memorization strategy that we would consider easy. The actual mechanism here is probably closer to random projection + nearest neighbor; NNs are not obviously learning crypto functions. But they /are/ learning some kind of lookup mechanism. There's some indication (see Sara Hooker's work) that in practice they use a mixture of 'reasonable' strategies and memorization for long-tail training examples. We don't know /how much/ the leading networks trained on real labels rely on memorization because we don't have any real insight into the learned structures. (as an aside, we train neural networks for discontinuous functions all the time: Classification is discontinuous, by the nature of the labels. We turn it into a continuous+trainable problem by choosing a probabilistic framing.) |
|
And while we interpret the result of a classification as a 1 or 0, the underlying result is a continuous probability. Even in reality, our training examples are labeled with too much confidence - some labels are vague even for humans. If it approximates a discontinuous function, then it does so by approximating a continuous function. You can read here for more information: https://www.sciencedirect.com/science/article/abs/pii/089360...