|
|
|
|
|
by sillysaurusx
1003 days ago
|
|
If the question is "Would it be possible to get GPT to try to add backdoors to code examples by poisoning the training data?" my answer would be no. The sheer quantity of training data means that even with GPT-4's assistance in generating code examples that match the format of the original training data, you wouldn't be able to inject enough poison to change the model's behavior by much. Remember, once the model is trained, it's verified in a number of ways, ultimately based on human prompting. If the tokens that come out of an experimental model are obviously bad (because, say, the model is suggesting exploits instead of helpful code), all that will do is get a scientist to look more deeply into why the model is behaving the way it is. And then that would lead to discovering the poisoned data. The payoff for an attacker is whether they can achieve some sort of goal. You'd have to clearly define what that goal is in order to know how effective the poisoning attack could be. What's the end game? |
|
It's possible there's some minimum amount of poisoned data (a % or log function of a given dataset size n) that would then translate to generating a vulnerable output in x% of total outputs. If x is low enough to get past fine tuning/regression testing but high enough to still occur within the deployment space, then you've effectively created a new category of supply-chain attack.
There's probably more research that needs to be done into occurrence rate of poisoned data showing up in final output, and that result is likely specific to the AI model and/or version.