| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by energy123 358 days ago
	Another way to put it: There's a single "this is not bad" circuit that stop lots of unrelated bad things. Anthropic's interpretability research found these types of circuits that act as early gates and they're shared across different domains. Which makes sense given how compressed neural nets are. You can't waste the weights.