Hacker News new | ask | show | jobs
by HarHarVeryFunny 822 days ago
> Arithmetics is extremely easy for a neural network to perform and learn perfectly

That'd depend on the design of the neural net and training objective.

It's certainly not something that comes naturally to an LLM which neither has numbers as inputs or outputs, nor is trained with an arithmetic objective.

Consider inputting "12345 * 10" into GPT-4. First thing it is going to do is tokenize the input, then embed these tokens, and these embedding vectors are then the starting point of what the transformer has to work with...

https://platform.openai.com/tokenizer

You can use OpenAI's tokenizer tool (above) to see how it represents the "12345 * 10" character sequence as tokens, and the answer is that it breaks it down into the token ID sequence [4513, 1774, 353, 220, 605]. The [4513, 1774] represents the character sequence "12345", and "605" represents the character sequence "10".

These token ID's will then be "embedded", which means mapping them to points in a very high dimensional space (e.g. 4096-D for LLaMA 7B), so each of those token ID's becomes a vector of 4096 1's and 0's, and these vectors are what the model itself actually sees as input.

So, for "12345 * 10", what the model sees during training is that whenever it sees V1 V2 V3 V4 it should predict V5, where V1-5 are those 4096-D input token embeddings. The model has no idea what any of these mean - they might represent "the cat sat on the mat" for all it knows. They are just a bunch of token representations, and the LLM is just trying to find patterns in the examples it is given to figure out what the preferred "next token" output is.

So, could you build (and train) a neural net to multiply, or add, two numbers together? Yes you could, if that is all you want to do. Is that what an LLM is? No, an LLM is a sequence predictor, not an NN designed and trained to do arithmetic, and all that is inside an LLM is a transformer (sequence-to-sequence predictor).

1 comments

I know why it is hard for LLMs to learn this, that was the whole point. The way we make LLMs today means they can't identify such structures, and that is strong evidence they wont become smart just by scaling since all the things you brought up will still be true as we scale up.

To solve this you would need some sub networks that are pretrained to handle numbers and math and other domains, and then you start training the giant LLM it can find and connect those things. But we don't know how to do that well yet afaik, and I bet all the big players has already tested things like that. As you say adding capabilities to the same model is hard.

An LLM can learn to identify math easily enough, it's just that performing calculations just using language isn't very efficient, even if it's basically what we do ourselves. If you want an LLM to do it like us, then give it a pencil and paper ("think step by step").

If you want the LLM to be better than a human at math, then give it a calculator, or access to something like Wolfram Alpha for harder problems. Your proposed solution of "give it a specialized NN for math" is basically the same, but if you are going to give it a tool, they why not give it a more powerful one like a calculator ?!