| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tekacs 51 days ago
	What you're describing only applies to security or biotech downgrades. A downgrade related to the model believing that you're doing something related to model development is invisible and silent and internal.

1 comments

steveklabnik 51 days ago

Anthropic has reversed that decision. (But that just happened so it might have been true during the article's testing.)

link

espeed 51 days ago

When I reported this, Anthropic sent me an email on Tuesday saying, "You have been approved into the Cyber Verification Program", but it's still downgrading. Is this a bug? What's the point of the Cyber Verification Program if Fable 5 downgrades when you tell it to write secure code?

link

steveklabnik 51 days ago

I don’t think that’s relevant? The change is that it will no longer silently downgrade, and will instead be honest that it’s doing it in all cases.

link

rattray 51 days ago

I think that gets you access to mythos, which doesn't have the safeguards. It's configured as a separate model.

link

tekacs 51 days ago

I was just coming here to post this reply to myself! You're absolutely right! :)

Honestly so glad to see the reversal.

link

matheusmoreira 51 days ago

Not sure if it's wise to trust them again even if they say they reversed it.

link

wren6991 50 days ago

They've publicly apologised for the invisible PEFT that deliberately makes the model dumb on some tasks. Whether they still do it, or will once again do it in future in more subtle ways, is something we can't verify.

Personally I think they have proven themselves to be the stewards of AI in the same way Exxon Mobil are the stewards of petroleum.

link