|
|
|
|
|
by milansuk
803 days ago
|
|
This is an implementation of a transformer and in README it's presented as text->text. Tokens are just integers going in and out. Is it possible to use it to train other types of LLMs(text->image, image->text, speech->text, etc.)? |
|
Patch of pixels ---> token Fragment of input Audio ---> token etc