| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by maxalbarello 108 days ago
	Also wondering how to evals agentic pipelines. For instance, I generated memories from my chatGPT conversation history, how do I know whether they are accurate or not? I would like a single number that I would use to optimize the pipeline with but I find it hard to figure out what that number should be measuring.

1 comments

yelmahallawy 107 days ago

And I think this is a common problem actually — figuring out what to measure and how to measure it – it's not black and white. What I do is have a few dimensions to measure it against (this may or may not fit your use case): relevance, instruction following, clarity, hallucination rate, etc. but even then, it becomes hard to measure things like 'clarity'.

link