|
|
|
|
|
by enraged_camel
64 days ago
|
|
>> We saw yesterday that expert orchestration around small, publicly available models can produce results on the level of the unreleased model. This is false. Yesterday's article did not actually show this, and there are many comments in the discussion from actual security people (like tptacek) pointing that out. |
|
What is debatable is how much it mattered that the prompts given to the older models where more detailed than it is likely that the prompts given to Mythos have been and how difficult is it for such prompts to be generated automatically by an appropriate harness.
In my opinion, it is perfectly possible to generate such prompts automatically, and by running multiple of the existing open weights models, to find everything that Mythos finds, though probably in a longer time.
Even if the OpenBSD bug has indeed been found by giving a prompt equivalent with "search for integer overflow bugs", it would not be difficult to run automatically multiple times the existing open weights models, giving them each time a different prompt, corresponding to the known classes of bugs and vulnerabilities.
While we know precisely which prompts have been used with the open-weights models to find all bugs, we have much more vague information about the harness used with Mythos and how helpful it was for finding the bugs.
Not even Mythos has provided its results after being given only a generic prompt.
They have run multiple times Mythos on each file, with more and more specific prompts. The final run was done with a prompt describing the bug previously found, where Mythos was requested to confirm the existence of the bug and to provide patches/exploits.
See: https://red.anthropic.com/2026/mythos-preview/
So the authors of that article are right, that for finding bugs an appropriate harness is essential. Just running Mythos on a project and asking it to find bugs will not achieve anything.