Hacker News new | ask | show | jobs
by kqr 177 days ago
I have a draft doing this with text adventures: https://entropicthoughts.com/updated-llm-benchmark