Hacker News new | ask | show | jobs
by zokier 2256 days ago
> This matters because, if one is clever, it provides an escape hatch from system which is small, predictable, controllable, and secure, to one which could do anything. It’s hard enough to make a program do what it’s supposed to do without giving anyone in the world the ability to insert another program into your program, which can then interfere with or take over its host

I take issue with the statement that TC means that something could "do anything". It means it can theoretically compute anything, but Turing completeness does not in itself give access to any additional resources; being TC does not magically allow something to talk to internet or write to disk or spawn new processes, or possibly not even allocate new memory. Computers do so much more than just compute; TC might be the first step towards being able to "do anything" but it certainly is not the final step.

From security point of view, what TC can (but not necessarily!) is open the host to denial of service by excessive resource consumption, or by non-terminating program. But as another comment noted, also non-TC systems can consume impractically large amounts of resources even if their resource consumption is not theoretically unbounded (as it is afaik with Turing machines).

4 comments

Author of one of the cited papers here. The author of the post falls into a common misconception of the weird machine literature (which led me to write my paper): conflating TC (the ability to compute any function worth computing) with the ability to transition the victim machine into states that should be unreachable via paths that should not exist (“weird machine programming”). It is a bit unfortunate that this misunderstanding is pervasive in early WM papers :-/ - this ensures perpetuation of the misunderstanding.
Complexity theorist here. In addition to your point, there's another commonly-overlooked problem: TC isn't quite the actual top! There are ways to make problems that, even with an oracle for solving Halting, are still hard [0].

It seems like folks are very quick to confuse the expressiveness of a machine with the expressiveness of analyzing programs for that machine; usually, a program is far harder to analyze than to run.

I think that a better way to view the post's author's point is by unpredictability. Given a short program in a weak setting, we can not only predict what the program will do, but what the program cannot do, usually because it is too short or too simple. In a Turing-complete setting, though, there are short programs with very unpredictable behavior.

[0] https://en.wikipedia.org/wiki/Turing_degree

I don't think this discussion thread is giving the writer's concern enough credit. Universality on its own obviously doesn't allow the instance to take over its host, but it can enable the bulk of the malicious payload to be encoded as legitimate instances of whatever P-hard optimization problem the cloud service solves, so that it need not be injected directly via the actual vulnerability that the malicious actor uses to take over the host.

> TC isn't quite the actual top! There are ways to make problems that, even with an oracle for solving Halting, are still hard [0].

I'm not sure why you're raising the subject of the degrees here. Malicious software doesn't need to have hypercomputational powers to be a threat.

I think that, if you're going to care about being exploited by real-world payloads, then Turing-completeness is a red herring; I agree with the thread-starter. For example, it is already bad enough to not be able to tell when computations are in P vs. NP, for responsiveness under load. It is not good when an NP database query halts a P Web server. For this reason, languages like Pola [0] which are far weaker than Turing-completeness are valuable.

And, if you thought that it was easy to be accidentally Turing-complete, wait until you see how easy it is to be accidentally NP [1]. The typical database query is in NP, because constraint satisfaction problems are in NP. So is the typical optimization problem.

[0] https://www.researchgate.net/publication/266217730_Pola_a_la...

[1] https://en.wikipedia.org/wiki/List_of_NP-complete_problems

I do want to add (as my parent comment was bit negative) that I do think non-TC programming is an area that deserves more research, as is the research around better characterizing TC programs (http://raml.co/ being example of that)
Yeah, it seems like "Turing-complete" is becoming a misused meme. Here's related post and thread about the Dhall configuration language:

http://www.haskellforall.com/2020/01/why-dhall-advertises-ab...

My comment:

That is, it’s fine (and often necessary) for a config language to be Turing complete. (This doesn’t mean it needs global variables or mutation, etc.)

The real issue is side effects (I/O), which is not technically related to Turing-completeness.

https://lobste.rs/s/gcfdnn/why_dhall_advertises_absence_turi...

And for some bizarre reason a couple people replying keep insisting that this serves some purpose, even though the post basically admits it's the wrong issue, but they're doing it as "signaling mechanism" (i.e. marketing to people who hold a misconception about computer science.)

But that's not even the end of it -- as mentioned:

not only is the messaging focused on an irrelevant thing, but the language design is too! It would almost make more sense if the messaging were the only odd thing.

I find your first link particularly ironic. It claims that certain words are used beyond their literal meaning as a "signaling mechanism" for "like-minded people" that you're doing the right thing. And "non-Turing completeness" is one such "shibboleth" that signals to security-minded people that your configuration language is very secure and awesome.

Now I could claim on the contrary that whenever you're using "security" and "non-Turing completeness" in that way together, the only thing you're actually signaling is that you don't know WTF you're talking about.

Yeah, I'm not so sure. Security is about ensuring certain guarantees. If your configuration language is Turing complete, it's easy to get into a spot where you simply can't be certain of the final state of your system. That's not secure. Turing complete configuration language might be an instant red flag, like "perpetual motion" is to physicists.

It's possible to ensure termination of Turing complete languages by rejecting certain programs, but the work required is not something you'll find in a config file parsing library.

I’m unsure of how you mean that in a technical sense. It’s straightforward to make those guarantees in your interpreter.

And just to be clear, Dhall, the configuration language we’re talking about, is not TC, but powerful enough to compute the Ackermann function: https://gist.github.com/Gabriel439/77f715350ecc0443eed5fa613...

Add “ackermann 10 10” to your configuration file and you have something that’s technically proven to terminate, but won’t do so before the sun burns out. No security properties are gained here.

The security property gained is that with terminatation, other sorts of analyses that check security properties of interest now become possible to prove. It doesn't mean they'll be tractable, but "possible" is a necessary precondition for tractable.
> Turing completeness does not in itself give access to any additional resources; being TC does not magically allow something to talk to internet or write to disk or spawn new processes, or possibly not even allocate new memory. Computers do so much more than just compute; TC might be the first step towards being able to "do anything" but it certainly is not the final step.

In many cases it is the final step.

If you're trying to secure something that lacks any good reason to access the internet, it shouldn't be able to. And yet so many things like that still have internet access.

This creates a problem when you have a program which is only supposed to process some sensitive data and not export it off to the attacker, because as soon as the attacker can execute their own code, the process already had access to the sensitive data and to the internet. Or there is no sensitive data but the process already had access to the internet, so now the attacker is using your hardware to mine cryptocurrency or route their network traffic through your IP address.

We could stop giving network access to processes that otherwise shouldn't need it, but that requires overcoming the incumbent economic forces that use network access for telemetry and advertising. So there are a lot of people hoping that making things that aren't Turing-complete is easy. But it turns out to be pretty hard. So we may have to start pushing back against those economic forces.

> because as soon as the attacker can execute their own code, the process already had access to the sensitive data and to the internet.

That's where your thinking goes wrong. TC does not mean that the program can take over the host process control flow.

> That's where your thinking goes wrong. TC does not mean that the program can take over the host process control flow.

But that's commonly what happens in practice. Return into libc and similar do exactly that.

Not only that, compromising the host process control flow is not strictly required. You may be executing weird machine instructions and not machine code instructions, but the weird machine being inside the host process often means that it already has access to at least some of the host process address space. When it's the whole address space or even an interesting subset (e.g. read access to sensitive data, write access to anything that goes into an outgoing network buffer), achieving TC is already the end of the game.

Access to host process data need not even be so direct. Once you can execute weird machine code inside the host process, it enables timing attacks that may reveal more host process data -- especially when you're inside the host process control flow, even if you don't fully control it. Exporting data is likewise possible if you can bring about any externally-visible change in the host process behavior whatsoever, e.g. the timing of outgoing network packets.

There are cases where none of these things are true but the cases where they are true are common. And they're also not generally regarded as a vulnerabilities that could justify mitigating them with things like denying network access, even though maybe they should be.

A good illustration of this is the linked TC regex[1], which can be implemented in Notepad++ or similar.

Just because you can hit "Replace all" a bunch of times to run the TC "regex code", transforming some input to some output, doesn't in itself mean you can make Notepad++ do something weird.

[1]: https://github.com/Tegmen/RegEx-Brainfuck-Interpreter