|
|
|
|
|
by manmal
498 days ago
|
|
I mean that maybe gradient descent is a passable sorting algorithm, once the weights have been learned to properly describe ordering. It may be a speciality of transformers that they can sort things well. Which wouldn’t tell us that much about whether they are mentalists or not. |
|