| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by quocanh 1728 days ago

Err.. hash functions like MD5 and SHA256 are "cryptographic". That just means one with a random distribution of outputs as opposed to maybe Apple's "neural" hash function which has outputs that do the "augmentation invariant projection" you speak of.

What I'm trying to say is that neural networks are "universal approximators of continuous real functions". You can think of them as finding the curve of a function which matches the data to an expected and they get their predictive power by matching the underlying "function" of the problem.

Applying a cryptographic hash function is like completely scrambling the underlying function. The only way for a neural network to match it is if it was somehow a universal approximator of a discontinuous real function. You can either do that by getting into unexplored chaos theory or making a gigantic lookup table for every single possible bit combination. The former no human being knows how to do, and the latter is impossible for even a 64 bit combination (nevermind an entire image, audio clip, or video).

1 comments

sdenton4 1728 days ago

>> making a gigantic lookup table for every single possible bit combination

You don't need this to achieve zero loss on the training set, though: You only need a lookup table for the images in the train set.

We know that neural networks can do something like this (learning the lookup table) because large networks can get to zero training loss on randomly assigned labels. (I linked the paper a bit further down in the thread.) This means there's some memorization capability in the architecture, even if it's a weird emulation of some memorization strategy that we would consider easy.

The actual mechanism here is probably closer to random projection + nearest neighbor; NNs are not obviously learning crypto functions. But they /are/ learning some kind of lookup mechanism. There's some indication (see Sara Hooker's work) that in practice they use a mixture of 'reasonable' strategies and memorization for long-tail training examples. We don't know /how much/ the leading networks trained on real labels rely on memorization because we don't have any real insight into the learned structures.

(as an aside, we train neural networks for discontinuous functions all the time: Classification is discontinuous, by the nature of the labels. We turn it into a continuous+trainable problem by choosing a probabilistic framing.)

link

quocanh 1727 days ago

Okay but that would only work for examples with which you already have. All interesting cases of neural networks are applying it to unseen inputs. How does your technique work with unseen inputs?

And while we interpret the result of a classification as a 1 or 0, the underlying result is a continuous probability. Even in reality, our training examples are labeled with too much confidence - some labels are vague even for humans. If it approximates a discontinuous function, then it does so by approximating a continuous function. You can read here for more information: https://www.sciencedirect.com/science/article/abs/pii/089360...

link

sdenton4 1726 days ago

Yes, this is the point: When we train a neural network, especially on a classification problem, it has multiple avenues to solve the problem. We know they are capable of ineffectual memorization, as well as some other less ridiculous things. When we train, it's not clear what mix we're getting of 'neural hashing' vs learning abstracted features.

My point up above is that classification problems are too weak, exactly because these kinds of shortcuts are readily available. The leading edge of ML research is over-focused on ImageNet classification in particular.

link

quocanh 1725 days ago

Ok so according to your theory, we could make this hypothesis: if we applied a neural network to an unseen example (for example, a validation dataset), then we would get accuracy that is equivalent to randomly picking a random label. Well surprise, surprise - we obviously don't get that. So there is clearly more going on than "neural hashing".

You're not answering this problem with unseen data so it's really hard for me to follow your reasoning here.

link