|
|
|
|
|
by alistairSH
76 days ago
|
|
How is success defined in those metrics? Is success "perfect - can deploy to prod immediately" or "saved some arbitrary amount of engineering time"? Anecdotal experience from my team of 15 engineers is we rarely get "perfect" but we do get enough to massive time savings across several common problem domains. |
|
That’s what marvels me is how fast LLMs are progressing. And it still feels like early days (!).
For methodology, I would check out the METR website though, they’ve published their results.