Anthropic NLAs translate LLM activations to human-readable text for safety

Y	Hacker News new \| ask \| show \| jobs

	Anthropic NLAs translate LLM activations to human-readable text for safety (presciente.com)
	1 points by sebastianperezr 46 days ago