|
|
|
|
|
by NiloCK
183 days ago
|
|
I appreciate horizon expansion as a fundamental metric, but duration seems like too crude a measure. We used to like it when computers were fast. An infinitely unscrupulous model provider could double this five hour result by cutting your output tokens/second in half! This isn't only a question of gaming the metric: the very strong current small-fast models (4.5 Haiku, Gemini 3 Flash) have no hope of being measured fairly against this - they will succeed or fail much faster just because they are much faster. How about something like total output token count as the "long term horizon" metric instead? |
|