Language models transmit behavioural traits through hidden signals in data

Y	Hacker News new \| ask \| show \| jobs

	Language models transmit behavioural traits through hidden signals in data (nature.com)
	4 points by armcat 57 days ago

2 comments

Related to this: https://www.nature.com/articles/d41586-026-00906-0 (LLMs can subliminally learn malicious behavior through distilling)

Explains the high performance of distilled models then (e.g. Chinese ones).