Show HN: I made a tiny, playable benchmark where LLMs compete head-to-head

Y	Hacker News new \| ask \| show \| jobs

	Show HN: I made a tiny, playable benchmark where LLMs compete head-to-head (llm-fighter.com)
	2 points by yz-yu 361 days ago
	TL;DR: LLM Fighter is a small, open-source, playable benchmark for agentic behavior. You bring an OpenAI-compatible API; the demo runs in the browser. It creates head-to-head “battles” that stress tools, planning, and efficiency, and shows step-by-step logs you can download. What it does well: quick, honest feel for how agents act under the same rules. What it’s not: a formal academic benchmark or a single “score”. Why I built it: I wanted something you can play in minutes and still learn from.