|
|
|
|
|
by extr
1477 days ago
|
|
This is kind of already what's happening inside the NN. You can think of intermediate layers in the network as talking to each other in "NN-ease", that is, translating from one form of representation (encoding) to another. At the final encoder layer, the input is maximally compressed (for that given dataset/model architecture/training regime). The picture (millions of pixels) of the dog is reduced to a few bits of information about what kind of dog it is and how it's posed, what color the background is, etc. However, optimality of encoding is entirely relative to the decoding scheme used and your purposes. Obviously a matrix of numbers representing a summary of a paragraph can be in some sense "more compressed" than the English equivalent, but it's useless if you don't speak matrices. Similarly, you could invent an encoding scheme with Latin characters that is more compressed than English, but it's again useless if you don't know it or want to take the time to learn it. If we wanted we could make English more regular and easier to learn/compress, but we don't, for a whole bunch of practical/real life reasons. There's no free lunch in information theory. You always have to keep the decoder/reader in mind. |
|