Y
Hacker News
new
|
ask
|
show
|
jobs
by
alex43578
62 days ago
And I think human written tests at that. If the LLM is blind to the failure mode X, does it know to reliably write a test to evaluate the behavior of X?