|
|
|
|
|
by cassianoleal
38 days ago
|
|
Hi! First of all, thanks for your incredibly thoughtful and enlightening answers, and most of all for helping keep Firefox alive. You said: > Still, some people would still be disclosing, and not many were filing quality bugs pre-Mythos. Some were, but it was a trickle compared to post-Mythos. How much of this could be just due to focus? i.e. prior to the partnership with Anthropic to test Mythos Preview, has there ever been a similarly focused project, specifically trying to find security bugs in Firefox? |
|
There are two things mixed together here. There is targeted scanning that was done by both Anthropic and Mozilla employees, using first Opus and then Mythos. Then there are other non-employee security researchers using AI to find and file bugs, motivated mostly by bug bounties.
The researchers were filing a steady trickle of bugs presumably using Opus 4.6. (Or rather, I saw a steady trickle after other people triaged them; I imagine the incoming stream was a lot busier.) My impression is that those have mostly dried up now. That could be the bias in my sample (I only see a slice of incoming bugs, so my anecdata aren't that strong), or a result of the restrictions added to the generally available models, or a result of there being less to find now that we've fixed so many of the issues found by company-backed bughunts. Or a combination of all three.
I guess my opinion is mostly driven by the difference in the quality and magnitude of bugs coming in from the company-backed scans pre- and post-Mythos. With Opus, there was an initial rush, but then it mostly died down. (For our group. For other groups, it was a series of waves that they never quite made it over before the next one came crashing in.) With Mythos, it was a larger wave and the quality of the bugs was higher. Two quantitative differences that ended up feeling like a qualitative change. So it's my underinformed personal opinion, but to me it feels like: yes, you could continue to find more bugs using a roughly Opus 4.6-strength model, but not that many and not cheaply, and the success rate is going to depend a lot on the harness. In comparison, I don't think we've seen the end of the Mythos wave, and my sense is that Mythos requires much less in the way of a harness.
It feels like the bitter lesson is playing itself out again, which I kinda hate because I want human ingenuity and cleverness to make an important difference, even after the next model has seen what the humans are coming up with.