| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by derac 16 days ago
	Look at the table of supported modalities. It can take in input of image/video/text/actions and output image/video/text/actions.

1 comments

That just raises more questions. What kind "observation or action" image does input generate? What is an action output if it's not text?