| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nodja 67 days ago
	Anyone familiar with the literature knows if anyone tried figuring out why we don't add "speaker" embeddings? So we'd have an embedding purely for system/assistant/user/tool, maybe even turn number if i.e. multiple tools are called in a row. Surely it would perform better than expecting the attention matrix to look for special tokens no?