Hacker News new | ask | show | jobs
by Sharlin 1180 days ago
The question is what happens when you go multimodal (which these things can do) and GPT(N+1) learns the associations between words and images/video, as well as the relationships between successive frames of video, at what point does it become unreasonable to claim that it doesn't "understand" something? How good at general-purpose predicting does an AI have to be in order for people to accept that it obviously has an internal model of things and is capable of abstractions?

(Assuming that this happens, of course. Diminishing returns could make scaling infeasible past some point, for instance.)