Hacker News new | ask | show | jobs
by oefnak 445 days ago
Well, you should check again, because it hasn't been text to text for a while. There are multimodal models now.
1 comments

If we want to get pedantic, in context of practical transformer models it has never been text to text. It has always been tokens to tokens. The tokens can represent anything. "multimodal" is a marketing term.