|
|
|
|
|
by shahules
469 days ago
|
|
It's an interesting article and I agree with some points you brought up here. But here are some of them to which I don't agree to 1. Evals are used throughout the article in the sense of LLM benchmarking, but this is not the point. One could effectively evaluate any AI system by building custom evals. 2. The purpose of evals is to help devs systematically improve their AI systems (at least how we look at it) not any of the ones listed in your article. It's not a one-time thing, it's a practice like the scientific method. |
|