Hacker News new | ask | show | jobs
by michaelnny 753 days ago
I think the main problem is the way we turn the raw mathematics symbols or equations into tokens, and these suboptimal tokenization may decreases the performance
2 comments

I thinks that's far from the only problem. To me the most obvious problem is that we use right-to-left numbers (think about the order you're writing digits when doing long addition) in a left-to-right language. Without a special number-flipping step; the transformer is forced to produce the output token-by-token, i.e. from left-to-right. Without the ability to store additional internal state, this turns addition into an O(N²) problem purely due to the suboptimal output ordering!
The paper discusses this, and the approach taken in the paper implements a number-flip stage, so numbers are formatted with their least significant figure first.
That doesn't stop decent code output for many computer languages.