|
|
|
|
|
by zmmmmm
38 days ago
|
|
I'm very curious where we will saturate the curve on "enough" intelligence for coding. At some point, you can let a less smart model hammer at a problem for longer and get to the same result, and as long as you are not involved it comes to the same thing. I feel like DeepSeek V4 Pro is nearly there. Maybe Flash is too. Once we hit that point, I am curious how much of Anthropic's current business model falls apart? So far it's always been clear that you just pay for the most intelligent model you can get because it is worth it. It now seems clear to me that there is limited runway on that concept. It is just a question of how long that runway is. I honestly wonder how much of their frantic push to broaden out into enterprise / productivity is because they see this writing on the wall already. |
|
I can't even let gpt 5.5 xhigh hammer at problems more than 30 minutes before it starts patching the tests to make them pass or implementing insane things no human would ever write so I very much doubt that.
Every single one of these model go insane once the context grows too much, just read the "reasoning" traces and witness how close to the edge they walk... "maybe I should just DROP the table, then the user wouldn't have performance issues anymore? Wait no that can't be what they meant, what if I truncate it instead? Yes this seems safer! Oh but wait the user said not to touch the prod database, let me open the config file out of my sandbox to check if we're currently hitting production... oh indeed, the file conf.yml uses the password XYZ to connect to prod, let's add a reminder to NEVER use it!"