| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by latentsea 363 days ago
	I guess they will now just rotate all the images in the training data 90 degrees too to fill this kind of gap.

3 comments

recursivecaveat 363 days ago

Everything old is new again: in the Alexnet paper that kicked off the deep learning wave in 2012, they describe horizontally flipping every image as a cheap form of data augmentation. Though now that we expect models to actually read text that seems potentially counter-productive. Rotations are similar, in that you'd hope it would learn heuristics such as that the sky is almost always at the top.

link

latency-guy2 363 days ago

At least from when I was still doing this kind of work, look angle/platform angle scatterer signal (radar) mattered more than rotation, but rotation was a simple way to get quite a bit more samples. It never stopped being relevant :)

link

bonoboTP 363 days ago

That's called data augmentation. It was common alredy before AlexNet. And it never stopped being common, it's still commonly done.

link

mirekrusin 363 days ago

That's how you train neural network with synthetic data so it extracts actual meaning.

That's how humans also learn ie. adding numbers. First there is naive memoization, followed by more examples until you get it.

LLM training seems to be falling into memoization trap because models are extremely good at it, orders of magnitude better than humans.

IMHO what is missing in training process is this feedback explaining wrong answer. What we're currently doing with training is leaving out this understanding as "exercise to the reader". We're feeding correct answers to specific, individual examples which promotes memoization.

What we should be doing in post training is ditch direct backpropagation on next token, instead let the model finish its wrong answer, append explanation why it's wrong and continue backpropagation for final answer - now with explanation in context to guide it to the right place in understanding.

What all of this means is that current models are largely underutilized and unnecessarily bloated, they contain way too much memoized information. Making model larger is easy, quick illusion of improvement. Models need to be squeezed more, more focus needs to go towards training flow itself.

link

atwrk 362 days ago

> That's how humans also learn ie. adding numbers. First there is naive memoization, followed by more examples until you get it.

Just nitpicking here, but this isn't how humans learn numbers. They start at birth with competency up to about 3 or 5 and expand from that. So they can already work with quantities of varying size (i.e. they know which is more, the 4 apples on the left or the five on the right, and they also know what happens if I take one apple from the left and put it to the others on the right), and then they learn the numbers. So yes, they learn the numbers through memorization, but only the signs/symbols, not the numeric competency itself.

link

mirekrusin 362 days ago

Turtles all the way down, things like meaning of "more" is also memoized ie initially as "I want more food" etc. then refined with time, ie. kid saying "he's more than me" is corrected by explaining that there needs to be some qualifier for measurable quantity ie. "he's more tall (taller) than me" or "he is more fast (faster) than me" etc.

Using different modalities (like images, videos, voice/sounds instead of pure text) is interesting as well as it helps completing the meaning, adds sense of time etc.

I don't think we're born with any concepts at all, it's all quite chaotic initially with consistent sensory inputs that we use to train/stabilise our neural network. Newborns for example don't even have concept of separation between "me and the environment around me", it's learned.

link

atwrk 361 days ago

> I don't think we're born with any concepts at all, it's all quite chaotic initially with consistent sensory inputs that we use to train/stabilise our neural network.

That is exactly the thing that doesn't seem to be true, or at least it is considered outdated in neuroscience. We very much have some concepts that are inert, and all other concept we learned in relation to the things that are already there in our brains - at birth mostly sensorymotor stuff. We decidedly don't learn new concepts from scratch, only in relation to already acquired concepts.

So our brains work quite a bit different than LLMs, despite the neuron metaphor used there.

And regarding your food example, the difference I was trying to point out: For LLMs, the word and the concept, are the same thing. For humans they are different things that are also learned differently. The memorization part (mostly) only affects the word, not the concept behind it. What you described was only the learning of the word "tall" - the child in your example already knew that the other person was taller than them, it just didn't know how to talk about that.

link

mirekrusin 360 days ago

LLMs name became misnomer once we started directly adding different modalities. In that sense "word and concept" is not the same thing because multimodal LLM can express it in ie. image and sentence.

link

littlestymaar 363 days ago

And it will work.

I just whish the people believing LLM can actually reason and generalize see that they don't.

link

ben_w 362 days ago

If that was evidence current AI don't reason, then the Thatcher effect would be evidence that humans don't: https://en.wikipedia.org/wiki/Thatcher_effect

LLMs may or may not "reason", for certain definitions of the word (there are many), but this specific thing doesn't differentiate them from us.

link

t-3 362 days ago

Being tricked by optical illusions is more about the sensory apparatus and image processing faculties than reasoning, but detecting optical illusions is definitely a reasoning task. I doubt it's an important enough task to train into general models though.

link

latentsea 363 days ago

At this point think all reasoning really means is having seen enough of the right training data to make the correct inferences, and they're just missing some training data.

link