| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by teaearlgraycold 116 days ago
	Well diffusers are trained unsupervised on raw pictures. I don't know how they train multi-modal LLMs on images, but yes obviously they are consuming other media than just text. I don't think, but would be happy to be corrected, that models glean much of their "knowledge" from non-textual training data.

1 comments

mikert89 116 days ago

you couldnt be more wrong

link

teaearlgraycold 116 days ago

Please tell me more. When I ask an LLM a question, and get a text response, can that response incorporate non-textual information from visual training data?

link