Hacker News new | ask | show | jobs
by Sohcahtoa82 7 days ago
It gets worse than that.

The Python `ipaddress` library has an `ip_address` address that returns either an IPv4Address or IPv6Address if the passed string is a valid IPv4 or IPv6 address, or throws a ValueError if the address is invalid.

I've seen code that uses that function to determine if a user-supplied string is a valid IP before passing it to a command line. At first glance, that seems fine, but some shell metacharacters are valid in the IPv6 zone ID.

`fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned` is a valid IPv6 IP, and if you did `ping fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned`, you'd have the output of `whoami` written to /tmp/pwned.

Obviously, people shouldn't writing code that puts user input into a shell call without the proper method of execution (ie, shell=False when using subprocess.Popen), but people often think "I validated it, it's fine" and then get popped because their validation wasn't as good as they thought it was.

EDIT: In case it isn't clear, `${PATH:0:1}` is necessary in the attack payload because a `/` is invalid in a zone ID. `${PATH:0:1}` is a tricky way to get a `/` character by just grabbing the first character of your PATH environment variable.

4 comments

> `fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned` is a valid IPv6 IP, and if you did `ping fe80::1%a;whoami>${PATH:0:1}tmp${PATH:0:1}pwned`, you'd have the output of `whoami` written to /tmp/pwned.

Is this really a Python problem? `subprocess.run` for example defaults to `shell=False` so you have to set `shell=True`, and on top of that be building up argv?

The "default" API for `subprocess.run` has you doing `subprocess.run(["ping", ip])` which... I think just entirely avoids this problem?

There's def a general sort of "oh people will just copy/paste stuff into a shell" or the whole shell script arg escaping mess. Just feels like Python is not really doing anything bad here.

Never underestimate the power of an LLM that's spent its entire context passing its own self-generated strings to `bash`, to think "maybe the quickest way to get this done is to pass a self-generated string to `bash`."
do note that even if you don't do shell expansion you're still subject to "smart" programs interpreting a single argv that starts with a dash as a parameter and its argument. I'm sure there's going to be a CVE about this at some point if there hasn't already.
Maybe the crazy part is also what is a valid IPv6 string. Amd for safety mostly-never pass anything to the shell.
IPv6 addresses are annoyingly complex. This isn't reason why because the shell-passing thing is a bad idea anyway, but it illustrates this.
Outside of interface identifiers, what is so complex about them? I think they end up being purely simpler than IPv4 addresses since they can’t be mistaken for DNS names.
Multiple official ways to format them, the :: stuff, more kinds of addresses that actually come up commonly (ULA, LL, privacy, etc)
Beyond the :: stuff, I can only think of IPv4-mapped IPv6 addresses, where you can represent a trailing 32 bits as dotted decimal (e.g. 2001:db8::192.0.2.1). And the :: stuff also exists in IPv4 in the same way, just using dots instead of colons.

ULA is equivalent to RFC1918.

LL also exists in IPv4 (PIPA), but I take your point that it's not common in most environments.

Yes, privacy addressing is different.

But, the context that you were commenting in was about the representation of addresses, not the semantics themselves ("what is a valid IPv6 string"). And there doesn't seem to be any greater complexity other than the IPv4-mapped IPv6 addresses thing. Which doesn't seem all that complex, especially if you see it as a tradeoff to escape the DNS name ambiguity of IPv4.

But they could be mingled with port numbers
No, because in a context where you'd have a port number, the address is surrounded with brackets: [2001:db8::]:80
I would argue that command line is for human input, so the failure already happened when they composed a `ping` shell command programmatically.

Granted, a lot of software works like that, but the command line was invented as a human interface, we just bungee-strapped a computer instead.

On the other hand, seperating concerns by process boundaries leads to more secure, composable and stable code. By not reinventing the wheel, you avoid a whole class of problems. Of course a stable API or library might be better, but convenience always wins out.
No-no, I mean launch processes by all means, just without shell substitutions.

Ever noticed that docker (and k8s) accept command line as an array? That's the way to go. It does not expand any env variables, path expansions (.. or *). Like

   command: ["java", "Main.java"]
But people hack it in order to get shell features, and that is the failure I mean:

   command: ["sh", "-c", "java Main.java"]
the second example runs shell, and shell is for humans, so is vulnerable to the attacks above.
Oh wow that makes me real scared.