Y
Hacker News
new
|
ask
|
show
|
jobs
by
Torkel
664 days ago
Multimodal models are not limited by semantic understanding though, right?
They are given photos, video, audio.
1 comments
elzbardico
664 days ago
For them, it is all a stream of tokens, roughly speaking.
link