Ask HN: How are you checking if your LLM is giving customers the right answer?

Y	Hacker News new \| ask \| show \| jobs

2 points by navaed01 384 days ago

Something that’s been bothering me is observability with LLMs and how to check it’s giving customers the right answer.

There seems to be multiple failure points: hallucinations, partial responses (missing facts), saying information does not exist, response accuracy depends on how and what is being asked.

How are you measuring this in production today? - Thumbs up/ down seems like a weak signal - Running a sample of ‘known queries’ Assumes you know what is being asked.

What have you tried that works for you?