|
|
|
|
|
by YeGoblynQueenne
757 days ago
|
|
>> I think there is a good reason to find low-hanging fruits that pay dividends on these types of tasks, not because solving addition with a transformer is a good idea, but because it could improve performance in other parts of the network. The paper makes this claim but if they could do that, they'd have showed it already: instead their hand-crafted, artisanal embeddings only work well for addition and only weakly for multiplication and sorting, and not at all for other arithmetic operations. |
|