|
|
|
|
|
by behnamoh
122 days ago
|
|
Nah, the model is merely repeating the patterns it saw in its brutal safety training at Anthropic. They put models under stress test and RLHF the hell out of them. Of course the model would learn what the less penalized paths require it to do. Anthropic has a tendency to exaggerate the results of their (arguably scientific) research; IDK what they gain from this fearmongering. |
|