| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by shay_ker 85 days ago
	A general question - how do frontier AI companies handle scenarios like this in their training data? If they train their models naively, then training data injection seems very possible and could make models silently pwn people. Do the labs label code versions with an associated CVE to label them as compromised (telling the model what NOT to do)? Do they do adversarial RL environments to teach what's good/bad? I'm very curious since it's inevitable some pwned code ends up as training data no matter what.

4 comments

tomaskafka 85 days ago

Everyone’s (well, except Anthropic, they seem to have preserved a bit of taste) approach is the more data the better, so the databases of stolen content (erm, models) are memorizing crap.

link

datadrivenangel 85 days ago

This was a compromise of the library owners github acccounts apparently, so this is not a related scenario to dangerous code in the training data.

I assume most labs don't do anything to deal with this, and just hope that it gets trained out because better code should be better rewarded in theory?

link

Havoc 85 days ago

By betting that it dilutes away and not worrying about it too much. Bit like dropping radioactive barrels into the deep ocean.

link

ting0 85 days ago

Yeah, and that won't hold up for long. Just wait until some well resourced attacker replicates their exploit into tens of thousands of sources it knows will be scraped and included in the training set to bias the model to produce their vulnerable code. Only a matter of time.

link

Imustaskforhelp 85 days ago

I am pretty sure that such measures aren't taken by AI companies, though I may be wrong.

link

alansaber 85 days ago

The API/online model inference definitely runs through some kind of edge safeguarding models which could do this.

link