|
|
|
|
|
by latentnumber
389 days ago
|
|
I would agree with this if the LLM never really modified the initial linear embeddings, but non-linearity in MLP layers and position/correlation fixing in the attention layers would mean that things are not so simple. I’m pretty sure there are papers showing compositionality and so on being represented by transformers. |
|