|
|
|
|
|
by mlsu
1356 days ago
|
|
Your comment reminded me of this video [1] They put an eye tracker on someone and captured their motion when walking in some rough terrain. You can sort of see that the person is focusing on the most likely place their foot will go next. [1] https://www.youtube.com/watch?v=ph6uUHq3a-g I think that we will discover that there is a more efficient way to encode temporal relationships, which appears to be "just throw transformers at it." My guess is that it will be in a more conceptual latent space that this attention will be applied. |
|