| > > Just intuitively, in such a high dimensional space, two random vectors are basically orthogonal. > What's the intuition here? Law of large numbers? Imagine for simplicity that we consider only vectors pointing parallel/antiparallel to coordinate axes. - In 1D, you have two possibilities: {+e_x, -e_x}. So if you pick two random vectors from this set, the probability of getting something orthogonal is 0. - In 2D, you have four possibilities: {±e_x, ±e_y}. If we pick one random vector and get e.g. +e_x, then picking another one randomly from the set has a 50% chance of getting something orthogonal (±e_y are 2/4 possibilities). Same for other choices of the first vector. - In 3D, you have six possibilities: {±e_x, ±e_y, ±e_z}. Repeat the same experiment, and you'll find a 66.7% chance of getting something orthogonal. - In the limit of ND, you can see that the chance of getting something orthogonal is 1 - 1/N, which tends to 100% as N becomes large. Now, this discretization is a simplification of course, but I think it gets the intuition right. |
Theoretically, I can claim that N random vectors of zero-mean real numbers (say standard deviation of 1 per element) will "with probability 1" span an N-dimensional space. I can even grind on, subtracting the parallel parts of each vector pair, until I have N orthogonal vectors. ("Gram-Schmidt" from high school.) I believe I can "prove" that.
So then mapping using those vectors is "invertible." Nyeah. But back in numerical reality, I think the resulting inverse will become practically useless as N gets large.
That's without the nonlinear elements. Which are designed to make the system non-invertible. It's not shocking if someone proves mathematically that this doesn't quite technically work. I think it would only be interesting if they can find numerically useful inverses for an LLM that has interesting behavior.
All -- I haven't thought very clearly about this. If I've screwed something up, please correct me gently but firmly. Thanks.