|
|
|
|
|
by shay_ker
85 days ago
|
|
A general question - how do frontier AI companies handle scenarios like this in their training data? If they train their models naively, then training data injection seems very possible and could make models silently pwn people. Do the labs label code versions with an associated CVE to label them as compromised (telling the model what NOT to do)? Do they do adversarial RL environments to teach what's good/bad? I'm very curious since it's inevitable some pwned code ends up as training data no matter what. |
|