|
|
|
|
|
by Ukv
616 days ago
|
|
"human intentions are not a generalisation of visual information" is a bit confusing category-wise. Question would be to what extent you can predict someone's next action, like running out to retrieve a ball, given just what a human driver can sense. Clearly that's possible to some extent, and in theory it should be possible for some system receiving the same inputs to reach human-level performance on the task, but it seems very challenging given the imposed constraints. Also, for clarity, note that the limitations don't require the model be trained only on driver-view data. It may be that reasoning capability is better learned through text pretraining for instance. |
|