| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by leogao 749 days ago
	Note that we focus on random positive activations, which are less susceptible to interpretability illusions than top activations (but also look less impressive as a result). We also provide access to random uncherrypicked features, whereas Anthropic does not. We made these choices deliberately to give as accurate an impression of autoencoder feature quality as possible. Also note that GPT-4 is a more powerful model than Sonnet, which makes it harder to train autoencoders with the same quality features.