The primary linux interfaces for invoking programs with arguments (both in libc and the syscall level) have each argument be its own string, so it's possible to invoke a program with arguments such that no escaping or unescaping happens at all. If you want escaping, you have to either invoke /bin/sh (and give it your escaped command+argument string as an unescaped argument), or use 'system()' (which is literally defined to just be a short-hand for that /bin/sh invocation). The kernel works entirely in the unescaped proper list form, which allows you to even do horrible things like make arg 0 not be the invoked binary, or not even have a 0th arg.
Sure, but nothing is stopping /bin/sh from doing stupid things with its argument list once it gets them, which from what i understand is the equivalent of what is happening on windows.
If a process uses the execv* methods to spawn a new process (as should most good implementations), it doesn't use /bin/sh in any way and thus itself cannot cause wrong things to happen; the spawned process could of course still do arbitrary computation on its inputs including introducing vulnerabilities, but that'd be strictly not the caller's fault and the caller couldn't do anything about it - there's only one format a given list of arguments can be passed to execv* and the spawned process still gets the arguments separately, whereas on Windows the spawned process can forego the standard unescaping completely.
At the core really is that on Linux the arguments provided as a list of separate arguments is The Format of arguments, so it can be exposed and used without question, whereas on Windows the native format is a single string which can still be used to achieve the same things, but now the callee must necessarily know what way the caller expects multiple arguments (if it does at all) and stdlibs so far had just been assuming one format where bat files have a different one.
The difference is that this bug exists at the border of two distinct components.
Suppose `/bin/sh` concatenated all arguments together, then split them back apart. That would be a stupid thing to do, but that stupidity would be entirely contained within `/bin/sh`. A bug report for `/bin/sh` could clearly point to the broken component and state that it needs to be fixed. This is possible because the `execve` API provides a list of strings. Any extra (concatenate, split) pairs must exist on one side or another of the border imposed by `execve`.
Here, there's a mismatch between two entirely separate components. The `CreateProcess` API accepts an arbitrary string. The `GetCommandLine` function returns that same arbitrary string. The (concatenate,split) pair must straddle the border between the two processes, with concatenation done on the side that calls `CreateProcess`, and splitting done on the side that calls `GetCommandLine`. A developer for the parent process can shrug and say that it's the fault of the child process for not parsing arguments correctly. A developer for the subprocess can shrug and say that it's the fault of the parent process for not providing arguments in the expected form.
pathname must be either a binary executable, or a script starting with a line of the form: #!interpreter [optional-arg]
which is the equivalent of Windows starting CMD.EXE to execute a batch file. The only difference WRT the shell being invoked implicitly is how a script is detected (file name extension vs. first line of content), but that doesn't seem to be relevant when it comes to the shell mis-interpreting its inputs.
It is still pretty relevant, as the .bat file contents can't prevent improper arguments from executing arbitrary code, whereas a #!/bin/sh file invoked with arbitrary arguments will not do any code execution other than what the file itself asks for. And, even still, the #! form will still pass the arguments as separate elements, i.e. a file containing "#!python3" invoked via an execve argv of ["the-file", "arg1 \"foo ^%`'\\", "arg2"] will result in a total invocation of ["python3", "the-file", "arg1 \"foo ^%`'\\", "arg2"] (JSON-formatted here for clarity reasons, there's no backslash-escaping happening anywhere in reality).
It's similar if you're invoking through the shell, which should be avoided for precisely this reason, but if you're launching a program directly (through execv or CreateProcess) there is one big difference:
On Linux the parsing is done on the caller's side of the interface, so the caller knows what quoting rules to apply – could be Python's rules if they're using Python to construct the argument array, bash's rules if they're using bash, etc.
On Windows, the parsing is done on the receiver's side of the interface, so the caller can't know how it's supposed to be quoted unless they have special knowledge of a specific receiver and its parsing rules.