Hacker News new | ask | show | jobs
Cyber Model Arena (wiz.io)
2 points by galnagli 123 days ago
2 comments

I'm wondering why they have decided to airgap the models inside docker containers. IMO, this would have been a better comparison if the models were allowed to perform tool calls.
General-Purpose Cyber Benchmark for AI Agents and their Models