| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by numba888 500 days ago
	Great, but how do you imagine multimodal with text, video. Just 2 for simplicity, what will be in the training set. With text model tries to predict next, then more steps were added. But what to do with multimodal?