|
|
|
|
|
by Kim_Bruning
118 days ago
|
|
> You're implicitly assuming that what you asked the LLM to do is unrepresented in the training data. That assumption is usually faulty - very few of the ideas and concepts we come up with in our everyday lives are truly new. I made a cursed CPU in the game 'Turing Complete'; and had an older version of claude build me an assembler for it? Good luck finding THAT in the training data. :-P (just to be sure, I then had it write actual programs in that new assembly language) |
|
Claude 4.5: not overfitted too much -- does the right thing 6/10 times.
Claude 4.6: overfitted -- does the right thing 2/10 times.
OpenAI 5.3: overfitted -- does the right thing 3/10 times.
These aren't perfect benchmarks, but it lets me know how much babysitting I need to do.
My point being that older Claude models weren't overfitted nearly as much, so I'm confirming what you're saying.