|
|
|
|
|
by alexhans
99 days ago
|
|
> The vibes are not enough. Define what correct means. Then measure. Pretty much. I've been advocating this for a while. For automation you need intent, and for comparison you need measurement. Blast radius/risk profile is also important to understand how much you need to cover upfront. The Author mentions evaluations, which in this context are often called AI evals [1] and one thing I'd love to see is those evals become a common language of actually provable user stories instead of there being a disconnect between different types of roles, e.g. a scientist, a business guy and a software developer. The more we can speak a common language and easily write and maintain these no matter which background we have, the easier it'll be to collaborate and empower people and to move fast without losing control. - [1] https://ai-evals.io/ (or the practical repo: https://github.com/Alexhans/eval-ception ) |
|