|
|
|
|
|
by yohji1984
2 days ago
|
|
I'm wondering why all these token-saving solutions focus their benchmarks exclusively on simple Q&A tasks. If their tools truly saved money in real, long-term programming tasks, they would have definitely published those benchmark results instead of just Q&A tests, especially since a simple code editing benchmark with a hidden eval harness is very easy to design. Personally, asking a coding agent questions without any code editing is a very rare case for me |
|