I'm sure you could get an LLM to create a plausible sounding justification for every decision? It might not be related to the real reason, but coming up with text isn't the hard part there surely
It seems solvable if you treat it as an architecture problem. I've been using LangGraph to force the model to extract and cite evidence before it runs any scoring logic. That creates an audit trail based on the flow rather than just opaque model outputs.
It's not. If you actually look at any chain-of-thought stuff long enough, you'll see instances where what it delivers directly contradicts the "thoughts."
If your AI is *ist in effect but told not to be, it will just manifest as highlighting negative things more often for the people it has bad vibes for. Just like people will do.
Yes, they will, they'll rationalize whatever. This is most obvious w/ transcript editing where you make the LLM 'say' things it wouldn't say and then ask it why.
I believe the point is that it's much easier to create a plausible justification than an accurate justification. So simply requiring that the system produce some kind of explanation doesn't help, unless there are rigorous controls to make sure it's accurate.
That's a great point: funny, sad, and true.
My AI class predated LLMs. The implicit assumption was that the explanation had to be correct and verifiable, which may not be achievable with LLMs.