|
|
|
|
|
by jacobr1
1140 days ago
|
|
There is a longer-term problem of trusting the explainer system, but in the near-term that isn't really a concern. The bigger value here in the near-term is _explicability_ rather than alignment per-se. Potentially having good explicability might provide insights into the design and architecture of LLMs in general, and that in-turn may enable better design of alignment-schemes. |
|