| at 500gb, you can store nearly everything ever written -- let alone compressed. all statistical learning is a variation on k-nn (see the relevant paper on this) but likewise this is obvious a priori k-nn is the ideal learner, and a good starting point for analysis the question for any given system is: what is the learning space, what is the distance function, and how many points are being considered NNs set up a compressed X,y space, in that space choose points via an empirical expectation, and obtain a weighted average as their prediction That's just what they do -- there isn't any other mechanism here. The whole formal structure of the NN can be written down on a page of paper your paper above doesn't deal with this --
it's a reply to the 'forced interpolation' view,
which i haven't espoused. but often NNs are forced interpolated 'extrapolation' is of
course a part of the possible predictive output of a statical learning system -- in that it's latent space is taken to be embedded in R^n
and so one can 'veer off' into R. Whenever you attribute a higher fidelity space
to a small latent space you are, in effect, extrapolating |
No you cannot.
>That's just what they do -- there isn't any other mechanism here.
That's not what they do. They are many papers now showing ICL demonstrating some kind of optimization method during inference which would not be happening if all they did was retrieval.
I'm come to realize you don't know what you're talking about. Your level of denial is scary to see.