| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by milansuk 803 days ago
	This is an implementation of a transformer and in README it's presented as text->text. Tokens are just integers going in and out. Is it possible to use it to train other types of LLMs(text->image, image->text, speech->text, etc.)?

2 comments

_giorgio_ 803 days ago

Yes, anything can be an input token.

Patch of pixels ---> token Fragment of input Audio ---> token etc

link

bootsmann 803 days ago

The transformer itself just takes arrays of numbers and turns them into arrays of numbers. What you are interested in is the process that happens before and after the transformer.

link