Y
Hacker News
new
|
ask
|
show
|
jobs
by
kolinko
48 days ago
There were few systems like claude in the past, to testing rulebook is not really written yet. And far from obvious.
1 comments
sockgrant
48 days ago
LLM evals are well established, are these not applicable here?
link