|
|
|
|
|
by fauigerzigerk
616 days ago
|
|
>Models don't need to have been trained on every single possibility - it's possible for them to generalize and interpolate/extrapolate. They do have some in-distribution generalisation capabilities, but human intentions are not a generalisation of visual information. |
|
Clearly that's possible to some extent, and in theory it should be possible for some system receiving the same inputs to reach human-level performance on the task, but it seems very challenging given the imposed constraints.
Also, for clarity, note that the limitations don't require the model be trained only on driver-view data. It may be that reasoning capability is better learned through text pretraining for instance.