| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mkaszkowiak 4 hours ago

What was your approach to benchmarking an adversarial agent?

This is an open problem that I came across (in a different domain), as the search space can be really wide. It's hard to measure results for non-trivial tasks.

Would be really interested if you can share your eval approach :)