| People have the notion that a latent representation in the animal sense, ie., a concept -- is the same thing as your "representation" in the NN sense. That's not the case. You're right that if I find a predictive compressions of faces, say F1...n then they arent literal "rememberings". And they seem to be able to participate in a decision process (eg., classification) which doesnt seem to target pixel patterns. However I think this is kind of an illusion. What `F1..n` are, are ambiguous pixel-space projections of the abstraction which isnt present in this projection. When I have the concept "this type of face" I can reason with it beyond similarity in pixel-space. When we form representations we arent restricted to reasoning with them in only one space (eg., how faces look as pixels). We (perhaps superstitiously) impart to machine "representations" an actual depth which they lack. They are templates derived from the spaces they live in, eg., pixel-space; and have only the properties that space affords (eg., pixel-geometry). Reasoning beyond that space, and those properties, doesnt work. People think it does. This is the illusion. Templates derived from this data, that we provide, function like actual representations because we simplify the world for the machine -- and prepare its environment so that its pixel-space templates are good enough. |
> When I have the concept "this type of face" I can reason with it beyond similarity in pixel-space.
I think there are at least two possible things that might going on here:
1. we're "trained" on non-pixel data (to use the same framing) and so it seems obvious that we would reason with a concept like "this type of face" in a non-pixel space.
2. the experience of "reasoning with it" is an illusion, and is merely the subjective experience that we have when our brains do stuff with whatever their underlying representation is. That is, we may have no real knowledge of what space our own "face model" is built on, what it represents, what properties it considers.