|
|
|
|
|
by WarmWash
61 days ago
|
|
Language really only exists at the input and output surfaces of the models. In the middle it's all numerical values. Which you might be quick in relating to just being a numeric cypher of the words, which while not totally false, it misses that it is also a numeric cypher of anything. You can train a transformer on anything that you can assign tokens to. |
|
This applies to any transformer-based architecture including JEPA which tries to make the tokens predict some kind of latent space (in which I've separately heard arguments as to why the two are equivalent, but that's a different discussion.)