Hacker News new | ask | show | jobs
Show HN: CATArena – Evaluating LLM agents via dynamic enviroment interactions (github.com)
3 points by jinqueeny 166 days ago