Hacker News new | ask | show | jobs
by rafaelero 972 days ago
Not using embeddings/lookup table means they can't generate image/audio, which to me it's a severe limitation. Why bother going to the process of generating a multimodal transformer if it's able to generate nothing but text?
2 comments

For an AI agent that should navigate a computer (which is Adepts use case IIRC) it should work, as it only has to output commands.
Many applications only need input, not output.