|
|
|
|
|
by robbedpeter
1536 days ago
|
|
The math the models are doing are similar to rote rule chaining as opposed to calculation. The errors made look like kludged together lookups. I wonder if you could sequence the training of a model so that you could reinforce calculations over lookups, to encourage the development of an accurate and advanced mathematics module. Neural networks can do math, but a lookup and memorized value model is structurally a lot different than a calculator model. The difference between them is a matter of weights for any given architecture. Tokenizing properly for math would help, but doing bit level tokenizing would be best, because that would allow multimodal domains to integrate more readily (i.e. audio/video/text models could share learned features more easily than if you are using parsed or domain specific tokens.) It's a great time to be alive. |
|