|
|
|
Show HN: I made a tiny, playable benchmark where LLMs compete head-to-head
(llm-fighter.com)
|
|
2 points
by yz-yu
315 days ago
|
|
TL;DR: LLM Fighter is a small, open-source, playable benchmark for agentic behavior. You bring an OpenAI-compatible API; the demo runs in the browser. It creates head-to-head “battles” that stress tools, planning, and efficiency, and shows step-by-step logs you can download. What it does well: quick, honest feel for how agents act under the same rules.
What it’s not: a formal academic benchmark or a single “score”.
Why I built it: I wanted something you can play in minutes and still learn from. |
|