|
|
|
|
|
by cyanmagenta
199 days ago
|
|
I am having trouble understanding the distinction you’re trying to make here. The computer has the same pixel information that humans do and can spend its time analyzing it in any way it wants. My four-year-old can count the legs of the dog (and then say “that’s silly!”), whereas LLMs have an existential crisis because five-legged-dogs aren’t sufficiently represented in the training data. I guess you can call that perception if you want, but I’m comfortable saying that my kid is smarter than LLMs when it comes to this specific exercise. |
|
Also my bet would be that video capable models are better at this.