| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jacobr1 1140 days ago
	There is a longer-term problem of trusting the explainer system, but in the near-term that isn't really a concern. The bigger value here in the near-term is _explicability_ rather than alignment per-se. Potentially having good explicability might provide insights into the design and architecture of LLMs in general, and that in-turn may enable better design of alignment-schemes.