| Statistical learning can typically be phrased in terms of k nearest neighbours In the case of NNs we have a "modal knn" (memorising) going to a "mean knn" ('generalising') under the right sort of training. I'd call both of these memorising, but the latter is a kind of weighted recall. Generalisation as a property of statistical models (ie., models of conditional freqs) is not the same property as generalisation in the case of scientific models. In the latter a scientific model is general because it models causally necessary effects from causes -- so, necessarily if X then Y. Whereas generalisation in associative stats is just about whether you're drawing data from the empirical freq. distribution or whether you've modelled first. In all automated stats the only diff between the "model" and "the data" is some sort of weighted averaging operation. So in automated stats (ie., ML,AI) it's really just whether the model uses a mean. |
you can look at it by results: I give these models inputs its never seen before but it gives me outputs that are correct / acceptable.
you can look at it in terms of data: we took petabytes of data, and with an 8gb model (stable difusion) we can output an image of anything. That's an unheard of compression, only possible if its generalizing - not memorizing.