Hacker News new | ask | show | jobs
by kolinko 48 days ago
There were few systems like claude in the past, to testing rulebook is not really written yet. And far from obvious.
1 comments

LLM evals are well established, are these not applicable here?