|
|
|
|
|
by gck1
6 days ago
|
|
Wait, so to get this straight, Anthropic knows: 1) LLMs are non-deterministic 2) This class of models has a particular tendency to "misbehave" 3) Their classifiers have a high rate of false positives 4) Millions of people give these models access to their machines And they still decided to specifically train this model to sabotage work if it thinks the work may be in competition with Anthropic? I think this has a name. I think it may be called malware. |
|