|
|
|
|
|
by ilaksh
2428 days ago
|
|
The AIs in the benchmark are all trained exclusively on text, correct? My assumption has always been that to get human-level understanding, the AI systems need to be trained on things like visual data in addition to text. This is because there is a fair amount of information that is not encoded at all in text, or at least is not described in enough detail. I mean, humans can't learn to understand language properly without using their other senses. You need something visual or auditory or to associate with the words which are really supposed to represent full systems that are complex and detailed. I think it would be much more obvious if there were questions that involved things like spatial reasoning, or combining image recognition with that and comprehension. |
|
In the specific remember that deaf-blind people exist, so if you're sure that you "need something visual or auditory" then those people are not, according to your beliefs, able to understand language. I think they'll disagree with you quite strongly.