Hacker News new | ask | show | jobs
by 0-_-0 757 days ago
To me this finding shows how transformers don't generalise, since they need specialised embeddings to handle a problem
3 comments

I think this is more a matter of how numbers are input and lack of specific training, including visual training.

For example, the number 12,345,678 is input to ChatGPT as the three tokens "123" "456" "78", which isn't the best place to start to learn that this is an 8 digit number with specific digit positions!

https://platform.openai.com/tokenizer

As a human child you learn about numbers largely visually by pointing to units, tens, hundreds etc, visually aligning them to add, etc. Maybe a multi-modal model, if it was visually trained on chalkboard primary school math, would do better in learning the concept of position based powers of 10, etc.

I'd say the key point here isn't that they "need" specialised embeddings, but rather that it improves things and it can samewhat manage without.

That's a far more surmountable problem. Maybe you need one model for biology and another for coding etc. i.e. Broad split by domain. Still weak AI not true general in AGI sense, but still seems like a good next step

The fact that transformers generalize is kinda evident from the fact that they can solve novel puzzles.