Hacker News new | ask | show | jobs
by djfergus 69 days ago
Reminds me of the terminus agent/harness on the terminal-bench coding benchmark - they just send send keystrokes to a tmux session. They score pretty well.

https://www.tbench.ai/news/terminus