Hacker News new | ask | show | jobs
by vandreas2 1429 days ago
What is the difference between memorization and learning? Could you please elaborate on this? It always seemed to me that a lot of learning is in fact memorization otherwise you wouldn't need a large dataset of cars photos from every angle (or some angles so that ML can work out in the in-between poses, no amount of 'learning' of photos from the front can work out what a car looks like from the side) to be able to recognise them. Also in what context would you get expensive ML disasters? If you keep retraining on cars as new car models come out then you get 100% recognition memorization notwithstanding, which in the end is what you would want.
2 comments

Learning in this sense means it’s able to extrapolate to unseen data (I.e., it’s learned correlations that are inherent to the problem, not just inherent in the training data). Memorizing, on the other hand, implies that it will do very well on any examples in the domain of the training data, and will break (sometimes pretty hilariously) on unseen data.

The fact that refitting once a day improves real world performance actually makes me think that the problem/data they work on is highly non-stationary, not that the model is memorizing. If it was purely memorizing then the model would perform poorly on all non-training data, and would not work for even one day.

To your point about large datasets - the large datasets are what allow learning to take place. With the most common forms of models we have now, they will memorize when they only have a few examples, and only “learn” when the training data is large enough. There is work to improve learning from a handful of examples, but in many of these cases they require a model that was already trained on the domain in question, and then are specialized to a specific use-case.

OK I think I got what you're saying, is that memorizing means that it will perform only on data it has seen in training that is exactly identical to the current one, and it will fail on even slight variations. My question then becomes what's the boundary of that extrapolation in order for it to be considered learning vs memorization? For example to come back to my earlier example, let's say a model is trained on photos of cars taken from only some angles, then it should be able to extrapolate to the intermediate ones without having seen the exact same photo. A large dataset would ensure that it has enough angles to be able to do that right? But there is no amount of photos that will make it identify a car from an underneath photo if it hasn't already seen one from underneath I wouldn't think. So this is the limit of 'learning' as opposed to 'memorization'?
Yes, your explanation is essentially correct. There is work done in the areas you’re talking about - essentially forcing models to more explicitly learn “concepts” - and in very large language models that seems to be emerging naturally. But current vision models would almost certainly break when trying to identify a vehicle from the bottom shot if it had never seen a vehicles undercarriage during training. Current vision models are capable of identifying vehicles from arbitrary angles (when viewed from the side/head on) and in arbitrary shades/colors/models/etc, and that’s about the amount of extrapolation we’d be talking about.
I don’t think your premise is correct. The holy grail of such systems - Human Intelligence- will also break similarly if it’s asked to identify a car from an undercarriage when the human subject has never ever seen an undercarriage. We really forget how much data humans are able to expose themselves to in their formative years. I’d often bend down to fetch my ball that had accidentally slid under a parked car and that’s how I learnt about the look of undercarriages.
There actually is no difference, like, at all.

GPT-2 was for a time the best lossless text compressor (lossless compression is just memorization) - https://bellard.org/libnc/gpt2tc.html