| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gck1 6 days ago

Wait, so to get this straight, Anthropic knows:

1) LLMs are non-deterministic

2) This class of models has a particular tendency to "misbehave"

3) Their classifiers have a high rate of false positives

4) Millions of people give these models access to their machines

And they still decided to specifically train this model to sabotage work if it thinks the work may be in competition with Anthropic?

I think this has a name. I think it may be called malware.

2 comments

novaomnidev 6 days ago

That is the perfect description. malware! What is sad is that there is no going back from this. Now that we know that they do this, I'll never believe they aren't doing it in other domains, or won't extend it to other domains in the future. This is probably the worst thing they could have possibly ever done for trust.

link

creativeSlumber 6 days ago

... that you pay to install on your machine.

link