Hacker News new | ask | show | jobs
by hrpnk 492 days ago
What's missing in these examples are evals and any advice on creating a verification set that can be used to assert that the system continues to work as designed. Models and their prompting patterns change; one cannot just pretend that a 1-time automation will continue to work indefinitely when the environment is constantly changing.