Hacker News new | ask | show | jobs
by Jensson 302 days ago
> But the algorithms they teach humans in school to do long-hand arithmetic (which are liable to be the only algorithms demonstrated in the training data) require a single unique numeral for every digit.

But humans don't see single digits, we learn to parse noisy visual data into single digits and then use those single digits to do the math.

It is much easier for these models to understand what the number is based on the tokens and parse that than it is for a visual model to do it based on an image, so getting those tokens streamed straight into its system makes its problem to solve much much simpler than what humans do. We weren't born able to read numbers, we learn that.