Hacker News new | ask | show | jobs
by hommes-r 202 days ago
"Because there’s too much you need to feed into it" - what does the author mean by this? If it is the amount of data, then I would say sampling needs to be implemented. If that's the extent of the information required from the agent builder, I agree that an LLM-as-a-judge e2e eval setup is necessary.

In general, a more generic eval setup is needed, with minimal requirements from AI engineers, if we want to move forward from Vibe's reliability engineering practices as a sector.