|
|
|
|
|
by lukev
244 days ago
|
|
Think about it this way; they are encoding whole "thoughts" or "ideas" as single tokens. It's effectively a multimodal model, which handles "concept" tokens alongside "language" tokens and "image" tokens. A really big conceptual step, actually, IMO. |
|