|
|
|
|
|
by artursapek
51 days ago
|
|
Good idea about the leaderboard for open vs closed models! Point taken on using an agent. I went that route because part of the goal for this benchmark is to inform which models I push in my agentic word processor, which uses tools for focused proofreading/editing. It's much faster and generally cheaper to use tools for surgical changes on large documents, rather than having the model spit out the entire document with all issues corrected. So yes, I am trying to measure agentic abilities here. A simple one-pass full-rewrite test would also make an interesting benchmark, though. |
|