| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rafaelero 972 days ago
	Not using embeddings/lookup table means they can't generate image/audio, which to me it's a severe limitation. Why bother going to the process of generating a multimodal transformer if it's able to generate nothing but text?

2 comments

For an AI agent that should navigate a computer (which is Adepts use case IIRC) it should work, as it only has to output commands.

Many applications only need input, not output.