| > Arithmetics is extremely easy for a neural network to perform and learn perfectly That'd depend on the design of the neural net and training objective. It's certainly not something that comes naturally to an LLM which neither has numbers as inputs or outputs, nor is trained with an arithmetic objective. Consider inputting "12345 * 10" into GPT-4. First thing it is going to do is tokenize the input, then embed these tokens, and these embedding vectors are then the starting point of what the transformer has to work with... https://platform.openai.com/tokenizer You can use OpenAI's tokenizer tool (above) to see how it represents the "12345 * 10" character sequence as tokens, and the answer is that it breaks it down into the token ID sequence [4513, 1774, 353, 220, 605]. The [4513, 1774] represents the character sequence "12345", and "605" represents the character sequence "10". These token ID's will then be "embedded", which means mapping them to points in a very high dimensional space (e.g. 4096-D for LLaMA 7B), so each of those token ID's becomes a vector of 4096 1's and 0's, and these vectors are what the model itself actually sees as input. So, for "12345 * 10", what the model sees during training is that whenever it sees V1 V2 V3 V4 it should predict V5, where V1-5 are those 4096-D input token embeddings. The model has no idea what any of these mean - they might represent "the cat sat on the mat" for all it knows. They are just a bunch of token representations, and the LLM is just trying to find patterns in the examples it is given to figure out what the preferred "next token" output is. So, could you build (and train) a neural net to multiply, or add, two numbers together? Yes you could, if that is all you want to do. Is that what an LLM is? No, an LLM is a sequence predictor, not an NN designed and trained to do arithmetic, and all that is inside an LLM is a transformer (sequence-to-sequence predictor). |
To solve this you would need some sub networks that are pretrained to handle numbers and math and other domains, and then you start training the giant LLM it can find and connect those things. But we don't know how to do that well yet afaik, and I bet all the big players has already tested things like that. As you say adding capabilities to the same model is hard.