Hacker News new | ask | show | jobs
by shahules 469 days ago
It's an interesting article and I agree with some points you brought up here. But here are some of them to which I don't agree to

1. Evals are used throughout the article in the sense of LLM benchmarking, but this is not the point. One could effectively evaluate any AI system by building custom evals.

2. The purpose of evals is to help devs systematically improve their AI systems (at least how we look at it) not any of the ones listed in your article. It's not a one-time thing, it's a practice like the scientific method.

1 comments

2. I think to improve is the next step. KNOWING if the sytem even performs according to set criteria is more important. Improvement can't be made if you don't have any evals to know it is improving.