|
|
|
|
|
by ACCount37
217 days ago
|
|
That's because they are as close to "object measure capabilities" as anything we're ever going to get. Without benchmarks, you're down to evaluating model performance based on vibes and vibes only, which plain sucks. With benchmarks, you have numbers that correlate to capabilities somewhat. |
|