Hacker News new | ask | show | jobs
by saint_yossarian 307 days ago
One thing that comes to mind: You still have to verify that the tests are exhaustive, and that the code isn't just gaming specific test scenarios.

I guess fuzzing and property-based testing could mitigate this to some extent.

1 comments

Yes, we are getting there. I think compiler is a bigger problem than unit tests given most verticals don't even have that. With unit tests, there would be some reward hacking but would be controlled at the model level + tests. (this is one of the reason i dont believe in transformer based llm as a judge for a verifier)