Hacker News new | ask | show | jobs
by dataviz1000 61 days ago
The red team adversaries are so effective. If Claude is blind to a bug, it won't surface using the same model from a red team adversary perspective. It requires using a different model which gpt-5.5 is great for. Yesterday I tried for the first time using gpt-5.5 as a adversary against the tests themselves. Later I thought it would be interesting to create a trickster agent which breaks the code after copy entire project into /tmp/ in order to control every aspect of it. Claude insists this called mutation testing. It would create regressions and then run all the tests. Finally it was able to unsupervised create an effective test harness.
1 comments

100%. I have a /codex skill that shells out to the codex CLI using GPT 5.5 xhigh. My /red-team skill uses that always.