| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by latentnumber 389 days ago
	I would agree with this if the LLM never really modified the initial linear embeddings, but non-linearity in MLP layers and position/correlation fixing in the attention layers would mean that things are not so simple. I’m pretty sure there are papers showing compositionality and so on being represented by transformers.