| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Torkel 715 days ago
	Multimodal models are not limited by semantic understanding though, right? They are given photos, video, audio.

1 comments

For them, it is all a stream of tokens, roughly speaking.