| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nomel 52 days ago
	> a LORA that's designed to inject bugs into your code A statement like this, clearly, requires a reference.

1 comments

mips_avatar 52 days ago

From the model card: "the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning" aka they will take your ML research code and inject bugs into it until it breaks using a LORA (or some other form of PEFT)

link

sciencejerk 51 days ago

Are they trying to fight back against model distillation?

link

bee_rider 52 days ago

“Limit effectiveness” could mean introducing performance degradation in your code. Which is arguably some sort of performance bug (I mean, ML codes are supposed to be high performance so I’d call unnecessary degradation a bug), but it could be borderline.

link

rurban 51 days ago

No, it is just a prominent "Cyber Security threat detected" blocker, with a button to appeal. I appealed because my work had nothing to do with neither cyber nor security, but the appeal was auto-closed. So no more Claude for this work.

link

nomel 52 days ago

Thanks, I thought maybe I missed something. That's an interesting way to interpret that.

link

mips_avatar 52 days ago

Anthropic is trying to hide bad behavior by being vague, it's important to not be vague when calling it out.

link

nomel 52 days ago

I'm of the opinion that removing guardrails is how you force regulation. What's your opinion on the balance?

link

dannyw 52 days ago

They have all transcripts for at least 30 days. The problem is that (as anyone who used Fable can attest) their classifiers are extremely sensitive and catch tons of innocent queries.

Imagine being a data scientist or MLE training a small classifier model. How do you know you won’t get steering vectors or a PEFT applied?

link

nomel 52 days ago

Since your answer isn't direct, I'm having a little trouble interpreting it.

Are you saying they should relax guardrails since they have 30 days to know if you produced something bad? If that is what you're saying, then I suspect they chose their current path to prevent, since you can't un-produce. Producing is what would cause regulations/PR problems.

link

mips_avatar 52 days ago

They’re not safety guardrails they’re anthropic doesn’t like anyone who isn’t anthropic working on AI rails

link

giancarlostoro 52 days ago

PEFT is a library, one of its capabilities is to produce LoRAs.

See:

https://heidloff.net/article/efficient-fine-tuning-lora/

link

adw 52 days ago

It's just an acronym, "parameter-efficient fine tuning". LoRA is one method, prefix tuning is another, there are more.

link