Hacker News new | ask | show | jobs
by DelightOne 341 days ago
How does an e2e test for less capable LLMs look like, you call each LLM one by one? Aren't these tests flaky by the nature of LLMs, how do you deal with that?