|
|
|
|
|
by stephantul
5 hours ago
|
|
We’ve been on the receiving end of this complaint with Semble. I think it is a valid complaint, but constructing a benchmark for this kind of thing is just very difficult and expensive because of the (harness) x (model) x (mcp/cli) combination. With traditional ml/tooling, not showing benchmarks was usually a red flag. But for llm tooling, I’m not so sure. |
|