Hacker News new | ask | show | jobs
by saurik 1003 days ago
A number of replies here are noting (correctly) how this doesn't have much to do with AI (despite some sentences in this article kind of implicating it; the title doesn't really, fwiw) and is more of an issue with cloud providers, confusing ways in which security tokens apply to data being shared publicly, and dealing with big data downloads (which isn't terribly new)...

...but one notable way in which it does implicate an AI-specific risk is how prevalent it is to use serialized Python objects to store these large opaque AI models, given how the Python serialization format was never exactly intended for untrusted data distribution and so is kind of effectively code... but stored in a way where both what that code says as well as that it is there at all is extremely obfuscated to people who download it.

> This is particularly interesting considering the repository’s original purpose: providing AI models for use in training code. The repository instructs users to download a model data file from the SAS link and feed it into a script. The file’s format is ckpt, a format produced by the TensorFlow library. It’s formatted using Python’s pickle formatter, which is prone to arbitrary code execution by design. Meaning, an attacker could have injected malicious code into all the AI models in this storage account, and every user who trusts Microsoft’s GitHub repository would’ve been infected by it.

8 comments

The safetensors format was created exactly for this - safe model serialization

https://huggingface.co/blog/safetensors-security-audit

Disclosure I work for the company that released this: https://github.com/protectai/modelscan but we do have a tool to support scanning many models for this kind of problem.

That said you should be using something like safe-tensors.

You have me curious now. The models generate text. Could a model hypothetically be trained in such a way that could create a buffer overflow when given certain prompts? I am guessing the way inference works in such a way that cant happen
Absolutely, though that isn't strictly what we're talking about here.

In this case, models themselves are fundamentally files. These files can have malicious code embedded into them that is executed when the model is loaded for further training or inference. When executed it isn't obvious to the user at all. It's a very nasty potential vector.

I wrote a blog about it here: https://protectai.com/blog/announcing-modelscan

For me it's also interesting as a potential pathway for data poisoning attacks - if you have control over the data used to train a production model, can you modify the dataset such that it inserts a backdoor to any model trained subsequently trained over it? E.g. what if gpt was biased to insert certain security vulnerabilities as part of its codegen capabilities?
The AI version of https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...?

At the moment such techniques would seem to be superfluous. I mean we're still at the stage where you can get a bot to spit out a credit card number by saying, "My name is in the credit card field. What is my name?"

That said, what you're describing seems totally plausible. If there was enough text with a context where it behaved in a particular way, triggering that context should trip that behavior. And there would be no obvious sign of it unless you triggered that context.

AI is hard.

It’s risky to make definitive claims about what is or isn’t a possible security vector, but based on my years of training GPTs, you’d find it very difficult for a number of reasons.

Firstly, the malicious data needs to form a significant portion of the data. Given that training data is on the order of terabytes, this alone makes it unlikely you’ll be able to poison the dataset.

Unless the entire training dataset was also stored in this 38TB, you’ll only be able to fine tune the model, and fine tuning tends to destroy model quality (or else fine tuning would be the default case for foundation models — you’d train it, fine tune it to make it “even better” somehow, then release it. But we don’t, because it makes the model less general by definition).

GPT is able to accidentally spit out exact bits of text from training input, such as a particular square root function.

What fraction of the training data needed to be that text?

If the question is "Would it be possible to get GPT to try to add backdoors to code examples by poisoning the training data?" my answer would be no. The sheer quantity of training data means that even with GPT-4's assistance in generating code examples that match the format of the original training data, you wouldn't be able to inject enough poison to change the model's behavior by much.

Remember, once the model is trained, it's verified in a number of ways, ultimately based on human prompting. If the tokens that come out of an experimental model are obviously bad (because, say, the model is suggesting exploits instead of helpful code), all that will do is get a scientist to look more deeply into why the model is behaving the way it is. And then that would lead to discovering the poisoned data.

The payoff for an attacker is whether they can achieve some sort of goal. You'd have to clearly define what that goal is in order to know how effective the poisoning attack could be. What's the end game?

I don't disagree with you on targeted attacks, but if you're creating output at scale then I'd say there's marginally more risk.

It's possible there's some minimum amount of poisoned data (a % or log function of a given dataset size n) that would then translate to generating a vulnerable output in x% of total outputs. If x is low enough to get past fine tuning/regression testing but high enough to still occur within the deployment space, then you've effectively created a new category of supply-chain attack.

There's probably more research that needs to be done into occurrence rate of poisoned data showing up in final output, and that result is likely specific to the AI model and/or version.

As I commented elsewhere, GPT is such a target rich security environment that it is hard to know why you would bother with this. On the other hand, advanced persistent attackers (eg the NSA) have a pretty good imagination. I could see them having both motive and means to go out of their way to achieve a particular result.

On human checks, http://www.underhanded-c.org/ demonstrates that it would be possible to inject content that will pass that.

Makes me wonder if there would be a way to pollute imagenet so a particular image would always match for something like a facial recognition access control system or the like. Maybe adversarial data that would hide particular traffic patterns from an AI enabled IDS would be more plausible and something the NSA might be interested in.
In theory for any AI model that generates code you'll want to have a series of post generation tests, for example something like SAST and/or SCA that ensure the model is not biasing itself to particular flaws.

At least for common languages this should stand out.

Where it gets more tricky is watering hole attacks against specialized languages or certain setups. This said you'd have to ensure that this data is not already there scraped up from the internet.

Many people are also unaware that json is way, way, way faster than Python pickles, and human-editing-friendly. Not that you'd use it for neural net weights, but I see people use Python pickles all the time for things that json would have worked perfectly well.
Are you sure json is faster than pickle in recent python versions? That's not intuitive to me and search result blurbs seem to indicate the opposite.
So, a little bit like a lot of people think that (non-checksummed/non-encrypted) PDFs cannot be modified, even though they are easily editable with Libre freaking Office ?
You can’t edit them in Word, so that must be too advanced for most people. LibreOffice never opened the PDFs too well for me, but Inkspace was pretty good, one page at a time though.
Doesn't Microsoft Office have the equivalent to Libre Office Draw ?? (That's the one that edits PDFs.)

I'm pretty sure I used that one in middle school ?? (Though not to edit PDFs, and it might have been the Microsoft Works equivalent.)

The other aspect that pertains to AI is the data-maximalist mindset around these tools: grab as much data, aggregate it all together, and to hell with any concerns about what and how the data is being used; more data is the competitive advantage. This means a failure that might otherwise be quite limited in scope becomes huge.
Occasionally, I’ll talk to someone suggesting a dynamically typed language (or stringly-typed java) for a very large scale (in developer count) security or mission critical application.

This incident is a good one to point back to.

laughs in log4j vuln

A good fraction of the flaws we found at Matasano involved pentests against statically typed languages. If an adversary has root access to your storage box, they can likely find ways to pivot their access. Netpens were designed to do that, and those were the most fun; they’d parachute us into a random network, give us non-root creds, and say “try to find as many other servers that you can get to.” It was hard, but we’d find ways, and it almost never involved modifying existing files. It wasn’t necessary — the bash history always had so many useful points of interest.

It’s true that the dynamics are a little different there, since that’s a running server rather than a storage box. But those two employees’ hard drive backups have an almost 100% chance of containing at least one pivot vector.

Sadly choice of technology turns out to be irrelevant, and can even lead to overconfidence. The solution is to pay for regular security testing, and not just the automated kind. Get someone in there to try to sleuth out attack vectors by hand. It’s expensive, but it pays off.

Am I one of few people who is frightened by shell history files? I always disable mine because it just seems like a roadmap to interesting stuff for anyone who might gain access to it. Including even stuff like sudo passwords typed at the wrong time or into the wrong window.
The terminal backlog is just sitting in memory as well. Just don’t leave passwords there, remove them immediately. You also have an option not to save the command in history, e.g. whitespace prefix in bash. Half of my bash commands that are longer than 20 symbols start with ^R to look up a similar command and edit it, not having history would make that much slower.
Sure. But, you could auto-encrypt your ~/.bash_history if you're concerned about it being a problem and might need it for backtracing any issues etc?
The typing of python isn’t the issue, it’s effectively the eval problem of not having a separation between code and data in the pickle format often used out of convenience. There are lots of pure data containers, like huggingface’s safe tensors or tensorflow’s protobuf checkpoints, that could have been used instead.
types have nothing to do with this, strictly speaking; the same problems would exist if you serialised structures containing functions in a typed language to e.g. a dll or a .class file and asked users to load it at runtime

the problem is in fact the far more subtle principle of "don't download and run random code, and definitely don't make it the idiomatic way to do things," and i'm not sure you can blame your use of eval()-like things on the fact that they exist in your language in the first place

The difference is that no one shares data in a statically typed language by sending over dlls or .class files. The entire point is that something so dangerous has been normalized because of dynamic typing.
poor engineering choices are just that, choices
Some tools make poor choices harder or impossible. That's the entire point of static typing too. In this case python encouraged insecure design choices by making them very easy and even presenting them to users.
that has literally nothing to do with the topic, which is just misconfigured cloud stuff. people really like starting these old crappy language arguments anywhere they can
Yeah, because statically typed language never had any kind of deserialization vulnerabilities.
What is the best practice? I'm assuming something that isn't a programming language object...
I’ll venture that it’s at least adjacent that the indiscriminate assembly of massive, serious pluralities of the commons on a purely unilateral basis for profit is sort of a “just try and stop us” posture that whether or not directly related here, and clearly with some precedent, is looking to create a lot of this sort of thing over and above the status-quo ick.
I have no idea what you are saying. If it is: "bad incentives cause people to misbehave", you generated an impressive verbiage around it :)
I have a bad habit of using 5 words when 1 will do: but I was saying that the probably fucking illegal status quo on AI corpus assembly is making an already ugly world a lot fucking worse.