|
|
|
|
|
by cootsnuck
35 days ago
|
|
Yup, spot on. There's a capability-reliability gap that the industry does not like to talk about too much. It often feels like the AI industry is continually glossing over the fact that capability and reliability are fundamentally different qualities. We tend to use "accurate" and "reliable" interchangeably, but they describe different things. A model can ace a benchmark (capability/accuracy) and still be a liability in production (reliability). Just look at recent reactions to yet another release from METR showing improved capabilities. But the less talked about part is how their measure is for a 50% success rate (and the even lesser talked about secondary measure they have at 80% success rate has a drastically lower time-horizon for tasks). https://metr.org/ I implement AI systems for enterprises and I don't know any that would ever be okay with 80% reliability (let alone 50%). |
|