Hacker News new | ask | show | jobs
by toxik 757 days ago
I think there is a good reason to find low-hanging fruits that pay dividends on these types of tasks, not because solving addition with a transformer is a good idea, but because it could improve performance in other parts of the network. Maybe there are other subsequences that could be annotated in this way? Per paragraph, tokens per word, who knows.

Obviously, the "best" way to do addition on a computer is by doing it exactly.

1 comments

>> I think there is a good reason to find low-hanging fruits that pay dividends on these types of tasks, not because solving addition with a transformer is a good idea, but because it could improve performance in other parts of the network.

The paper makes this claim but if they could do that, they'd have showed it already: instead their hand-crafted, artisanal embeddings only work well for addition and only weakly for multiplication and sorting, and not at all for other arithmetic operations.