Hacker News new | ask | show | jobs
by OskarS 1872 days ago
Haven't looked into this paper deeply, but this reads very strange to me:

> This paper reports on the discovery of an accidental arbitrary code execution vulnerability in Marvin Minsky's 1967 implementation of the universal Turing machine. By submitting crafted data, the machine may be coerced into executing user-provided code.

It's a universal Turing machine. Its whole purpose is running "user-provided code". That's what a Universal Turing machine does, it runs arbitrary Turing machines. This is a little bit like saying "we found a weakness in the Python interpreter whereby you can feed it specially crafted input that allows you to run arbitrary Python programs". Like... yeah... that's what it's supposed to do.

3 comments

If you do read the paper, you see that the universal machine U expects both a description of the simulated machine T, and a description of the input to T. The exploit is that they can cause arbitrary code execution not by changing the former but the latter. They also suggest a fix to U that makes it robust against exploits in the description of T's input.
> This is a little bit like saying "we found a weakness in the Python interpreter whereby you can feed it specially crafted input that allows you to run arbitrary Python programs".

If my understanding is correct, it's more like, "we found a weakness in the Python interpreter whereby you can feed it a specially crafted input to a Python script that allows you override that script and forces the Python interpreter to do something else", which can be a reasonable point to make.

BTW, if we keep using the Python analogy, this scenario sounds like a Python script that loads its configuration data from a Pickle file on the hard drive. In Python, Pickle doesn't validate the input and it's capable of modifying the internal state of the program. What the authors say is that malicious data in the Pickle file leads to code execution due to deserialization of untrusted data. Depending on your perspective, just like Python says a malicious Pickle file is not in its threat model, you can argue that the Universal Turing Machine is never meant to be protected from malicious data on the tape, and this is not a exploit, but an interesting thought experiment nevertheless.

The paper says,

> The universal machine, U, will be given just the necessary materials: a description, on its tape, of T and of [the initial configuration on T's own, simulated tape] s_x; some working space; and the built-in capacity to interpret correctly the rules of operation as given in the description of T. Its behavior will be very simple. U will simulate the behavior of T one step at a time [...]

> [...] There is one obvious trust boundary in a universal Turing machine, U: the initial string on the tape of the simulated Turing machine, T. That string corresponds to the user-provided data of an ordinary computer program. Because the potential users may be unknown to the developers and administrators of the computer and its programs, it is common to view this data as untrusted. In our explorations of the universal Turing machine, we will make the same assumption. Therefore, if it were possible to execute arbitrary code without manipulating the program of T, but only by providing crafted data on T’s simulated tape, that would constitute a vulnerability.

Basically the authors' reasoning is:

1. An Universal Turing machine is an interpreter/simulation that is capable of executing code written for any Turing machine, provided by the user.

2. The user code takes an external input - the initial content on the tape.

3. It's possible to maliciously craft an input (the content on the tape) to the user code to hijack the simulation, without modifying the user code.

Ah, that makes a ton of sense. Thanks!
Yes, that's the joke.
No it isn't. The paper forces the universal turing machine to execute a different machine than the one on the input tape by providing crafted input to the intended simulated machine. It isn't a joke paper.

U is fed machine T and input I. By only changing I you can force U to execute a different machine X.

Sure, but you achieve that by specifying I in a way you aren't really supposed to. You are effectively invoking undefined behavior.

The implications of this are well understood. You'd have similar vulnerabilities if you ran raw machine code on, say, an x86 processor. Enforcing checks and sanitization in such a way that the 'exploit' can't happen is the exact kind of job you'd have to do in order to write an operating system.

I get that the CVE is a joke, but are you saying the paper is as well?
It’s not a joke, people saying otherwise did not understand the paper. The significance is debatable, it’s mostly a kind of collective facepalm for theoretical computer scientists.

The CVE is a joke in the sense that you don’t really need a CVE for a universal Turing machine, as they are quite literally only academic.

i dont know if he says exactly that by i do say exactly that

paper is essentially saying that by using simplest machine possible

we prove it does not provide layers of security abstractions which were invented and deployed in last 40+ years to computers

so thats like getting CVE for turning off ASLR, AppARMOR SElinux etc in linux kernel.

As far as I can tell, yes, it is.

Showing a universal Turing machine is a proof of the fact that when you write stuff like the smn theorem or anything that uses universal functions you aren't writing nonsense.

There is no "contract" between the universal machine and the hypothetical user. One could certainly ensure that, say, the input TM is properly formatted, or that "the runtime" will only use a certain set of cells, and so on. The difficulties and problems one would find in trying to do so are the same one would face when writing an operating system and a compiler.

Wait until they try unicode, then the joke will be on them.
I cannot fathom how complicated arbitrary code execution could get with multi byte characters that could use shift registers, null bytes and byte order marks with determinism in a NOP slide on a heap spray.

Filtering only printable user input helps but even bit map images can expose a heap to a sensitive registers that will execute some target specific generated shell code.

https://en.m.wikipedia.org/wiki/NOP_slide.

https://en.m.wikipedia.org/wiki/Heap_spraying