Y
Hacker News
new
|
ask
|
show
|
jobs
by
levmiseri
80 days ago
For a loosely similar 'benchmark', I recently tried to test major LLMs on my coding game (models write code controlling their units in a 1v1 RTS) -
https://yare.io/ai-arena