|
>> Test it yourself, GPT 120B OSS is cheap and available. BTW, this is why with this bug, the stronger the model you pick (but not enough to discover the true bug), the less likely it is it will claim there is a bug. I guess this is the crux of the debate. All the claims are comparing models that are available freely with a model that is available only to limited customers (Mythos). The problem here is with the phrase "better model". Better how? Is it trained specifically on cybersecurity? Is it simply a large model with a higher token/thinking budget? Is it a better harness/scaffold? Is it simply a better prompt? I don't doubt that some models are stronger that other models (a Gemini Pro or a Claude Opus has more parameters, higher context sizes and probably trained for longer and on more data than their smaller counterparts (Flash and Sonnet respectively). Unless we know the exact experimental setup (which in this case is impossible because Mythos is completely closed off and not even accessible via API), all of this is hand wavy. Anthropic is definitely not going to reveal their setup because whether or not there is any secret sauce, there is more value to letting people's imaginations fly and the marketing machine work. Anthropic must be jumping with joy at all the free publicity they are getting. |