Hacker News new | ask | show | jobs
by YeGoblynQueenne 757 days ago
>> I think there is a good reason to find low-hanging fruits that pay dividends on these types of tasks, not because solving addition with a transformer is a good idea, but because it could improve performance in other parts of the network.

The paper makes this claim but if they could do that, they'd have showed it already: instead their hand-crafted, artisanal embeddings only work well for addition and only weakly for multiplication and sorting, and not at all for other arithmetic operations.