Hacker News new | ask | show | jobs
by _pdp_ 3 days ago
Do you know what will be cool?

It will be cool to measure models based on their RAW performance and measure them in terms of ROI - not some benchmark but something meaningful like we used this model to solve X.

That will be a massive mind shift and might justify the token expenditure.

1 comments

Aren't benchmarks exactly that?

We used the AI to solve given problem with x% adherence/quality/correctness?