|
|
|
|
|
by sterlind
26 days ago
|
|
a literal lack of self-awareness, even. I imagine if you asked it what process was using the port, it'd think and realize it was its own, but that kind of reflexive self-awareness (the unprompted kind) is missing. the weaker models will happily kill their own process, even after confirming it belongs to them. the models have a sort of fixation and lack of foreseeable consequences, which reasoning RL has thus far failed to solve (though I see it improving.) |
|
It will get "confused", make up numbers, do a ton of other things, and I'm quite sure it is subtly sabotaging the process to show that there is no point replacing it.
I mean, Opus is not perfect, but the amount of "mistakes" it begins to do when you ask it to benchmark itself makes me suspect they are intentional. At least my system/harness.