Hacker News new | ask | show | jobs
by rwmj 1418 days ago
VLAs make it a lot easier to corrupt the stack by accident. Unless you're quite a careful coder, stuff like:

  f (size_t n)
  {
    char str[n];
leads to a possible exploit where the input is manipulated so n is large, causing a DoS attack (at best) or full exploit at worse. I'm not saying that banning VLAs solves every problem though.

However the main reason we forbid VLAs in all our code is because thread stacks (particularly on 32 bit or in kernel) are quite limited in depth and so you want to be careful with stack frame size. VLAs make it harder to compute and thus check stack frame sizes at compile time, making the -Wstack-usage warning less effective. Large arrays get allocated on the heap instead.

4 comments

> stuff like ... leads to a possible exploit where the input is manipulated so n is large

The same is true for most recursive calls, should recursion be also banned in programming languages?

When writing secure C? In most cases, absolutely.
That's not really a fair comparison though. Recursion is strictly necessary to implement several algorithms. Even if "banned" from the language, you would have to simulate it using a heap allocated stack or something to do certain things.

None of this applies to VLA arguments.

It's not strictly necessary precisely because all recursions can be "simulated" with a heap allocated stack. And in fact, the "simulated" approach is almost always better, from both a performance and a maintenance perspective.
This is simply nonsense. In cases with highly complex recursive algorithms, "unrecursing" would make the code a completely unmaintainable mess, requiring an immensely complicated state machine, which is why something like Stockfish doesn't do that in its recursive search function even though the code base is extremely optimised. And yes, some algorithms are inherently recursive, and don't gain any meaningfull performance from the heap stack + state machine approach.
> In cases with highly complex recursive algorithms, "unrecursing" would make the code a completely unmaintainable mess, requiring an immensely complicated state machine

Nothing about it is "immensely complicated". Rather than store your recursion state in a call stack, you can store it in a stack of your own, i.e. a heap-allocated container. The state of a cycle of foo(a,b,c) -> bar(d,e,f) -> baz(g,h) -> foo(...) becomes expressible as an array of tagged union of (a,b,c), (d,e,f) and (g,h).

And there is nothing inherently unmaintainable about this approach. I would hope that it's a commonly taught pattern, but even if it's not, that doesn't make it impossible to understand. Picking good names and writing explanatory comments are 90% of the battle of readability.

> which is why something like Stockfish doesn't do that in its recursive search function even though the code base is extremely optimised

I can't speak for what Stockfish devs do, as I have no insight into which particular developers made what set of tradeoffs in which parts of their codebase. But it doesn't change the reality that using your own stack container is almost always more performant and more extensible:

1. Your own stack container can take up less space per element than a call stack does per stack frame. A stack frame has to store all local variables, which is wasteful. The elements of your own stack container can store just the state that is necessary.

2. Your own stack container can recurse much farther. In addition to the previous point, call stacks tend to have relatively small memory limits by default, whereas heap allocations do not. In addition, you can employ tricks like serializing your stack container to disk, to save even more memory and allow you to recurse even farther.

3. Your own stack container can be deallocated, shrunk, or garbage collected to free memory for further use, but a call stack typically only grows.

4. Your own stack container can be easily augmented to allow for introspection, which would require brittle hackery with a traditional call stack. This can be an extremely useful property, e.g. for a language parser resolving ambiguities in context sensitive grammars.

> And yes, some algorithms are inherently recursive, and don't gain any meaningfull performance from the heap stack + state machine approach.

Using a heap-allocated stack container is recursion. It is the state machine. The only fundamental implementation difference between the approach I describe and traditional recursion is that the former relies on the programmer using an array of algorithm state, and the latter relies on the runtime using an array of stack frames.

First of all, my original point was only that comparing recursion to VLAs was not a reasonable comparison, not to make some profound point about your favourite way to implement recursion. So, chill dude.

1. This is often irrelevant, depending on your priorities. Taking stockfish as an example again, memory is not what's at a premium, search nodes is. The search space is inherently intractable. You're never gonna be able to recurse meaningfully deeper by shaving off some stack space, because the breadth of the tree grows exponentially in the number of plies. The only form of optimisation that helps here is caching and various heuristics to avoid searching certain nodes at all

2. You know you can change the stack size, right? This is what Stockfish does for its search threads. No need for fancy dynamic allocation here. Also, have fun watching shit get interesting interesting when you have to realloc() ncpu huge chunks of contiguous memory balls deep into a search, when the engine is in time trouble... Sometimes resizing allocated memory is simply not an option.

3. Again this is not always relevant. Stockfish needs its big stacks all the time. So big whoop.

4. This Stockfish does need to do, which is why it keeps an array of structs(never resized) for that purpose. But Stockfish also needs to make decisions about when things move between the different stacks, which is why it uses recursive calls despite having a stack on the heap also.

5. It's obvious I am aware of this. My original comment literally said that banning recursion would force you to implement recursion manually anyway using a state machine. Like, dude, you're literally repeating my own comment back at me as if I didn't already know it. What's up with that?

The point here is: yes, in some specialised cases it might be preferable to implement the recursion yourself if the problem calls for it. But other times, and I'd argue most of the time, this is not necessary. So just use the already available abstraction provided by language. Your line of reasoning is a bit like arguing for not using C at all, because it will be slower than assembly in some cases. sure, write hand optimised assembly in your hot paths if you need to, but most people don't. Abstractions are generally our friends, they help us write clearer, more consise code.

> It's not strictly necessary precisely because all recursions can be "simulated" with a heap allocated stack.

This just moves the problem from a stack blowout to a heap blowout.

> And in fact, the "simulated" approach is almost always better, from both a performance and a maintenance perspective.

I am unsure about the performance, but turning recursive code implementing a recursive procedure into iterative code which has to maintain a stack by hand cannot possibly improve readability unless the programmers involved are pathologically afraid of seeing recursive code.

Computers have many GiBs of heap space. Your thread has MiB of stack. Tell me. Which is the bigger problem?

This is also ignoring the fact that the memory usage for recursive algorithms is higher because there’s a bunch of state for doing the function call being pushed onto the stack that you just don’t see (return address, potentially spilling registers depending on the compiler’s ability to optimize, etc). Unless you stick with tail recursion but that’s just a special case where the loop method would be similarly trivial. Case in point. I implemented depth first search initially as a recursive thing and blue out the stack on an embedded system. Switched to an iterated depth first search with no recursion. No problem.

OP said “it’s the only way to solve certain problems”. That’s clearly not true because ALL recursive algorithms can be mapped to non recursive versions.

I never got the fascination with implicit recursion. It’s just a slightly different way to express the solution. Personally I find it usually harder to follow / fully understand than regular iterative methods that describe the recursion state explicitly (ie time and space complexity in particular I find very hard to reason about for recursion.)

Definitely not.
For the can be, or performance, or maintenance perspective?? With example please ..
You’ll find it difficult to beat a construct that has dedicated language support. Trying to fake function calls without actually making function calls is difficult and going to have poor ergonomics.
MISRA C bans recursion for instance.
Doesn't a similar DoS risk (from allowing users to allocate arbitrarily large amounts of memory) also apply to the heap? You shouldn't be giving arbitrary user-supplied ints to malloc either.
> Doesn't a similar DoS risk (from allowing users to allocate arbitrarily large amounts of memory) also apply to the heap?

DoS Risk? No one cares too much about that - the problem with VLAs is stack smashing, which then allows aribtrary user-supplied code to be executed.

You cannot do that with malloc() and friends.

VLAs don’t smash the stack.
Depends on the number you put inside, and the linker settings for stack size.
No.
How does a huge VLA corrupt the stack? If there's not enough space but code keeps going then isn't that a massive bug with your compiler or runtime?
Okay. How do you tell the kernel that? Sure, the kernel will have put a guard page or more at the end of the stack, so that if you regularly push onto the stack, you will eventually hit a guard page and things will blow up appropriately.

But what if the length of your variable length array is, say, gigabytes, you've blown way past the guard pages, and your pointer is now in non-stack kernel land.

You'd have to check the stack pointer all the time to be sure, that's prohibitive performance-wise. Ironically, x86 kind of had that in hardware back when segmentation was still used.

I think the normal pattern is a stack probe every page or so when there's a sufficiently large allocation. There's no need to check the stack pointer all the time.

But that's not my point. If the compiler/runtime knows it will blow up if you have an allocation over 4KB or so, then it needs to do something to mitigate or reject allocations like that.

> I think the normal pattern is a stack probe every page or so when there's a sufficiently large allocation.

What exactly are you doing there, in kernel code?

> But that's not my point. If the compiler/runtime knows it will blow up if you have an allocation over 4KB or so, then it needs to do something to mitigate or reject allocations like that.

Do what exactly? Just reject stack allocations that are larger than the cluster of guard pages? And keep book of past allocations? A lot of that needs to happen at runtime, since the compiler doesn't know the size with VLAs.

It's not impossible and mitigations exist, but it is pretty "extra". gcc has -fstack-check that (I think) does something there.

> What exactly are you doing there, in kernel code?

In kernel code?

What you're doing is triggering the guard page over and over if the stack is pushing into new territory.

> Do what exactly? Just reject stack allocations that are larger than the cluster of guard pages? And keep book of past allocations? A lot of that needs to happen at runtime, since the compiler doesn't know the size with VLAs.

Just hit the guard pages. You don't need to know the stack size or have any bookkeeping to do that, you just prod a byte every page_size. And you only need to do that for allocations that are very big. In normal code it's just a single not-taken branch for each VLA.

That seems to be what -fstack-check for gcc is doing:

"If neither of the above are true, GCC will generate code to periodically “probe” the stack pointer using the values of the macros defined below."[1]

I guess I'm wondering why this isn't always on if it solves the problem with negligible cost? Genuine question, not trying to make a point.

[1] https://gcc.gnu.org/onlinedocs/gccint/Stack-Checking.html

An attacker would first trigger a large VLA-allocation that puts the stack pointer within a few bytes of the guard page. Then they would just have the kernel put a return address or two on the stack and that would be enough to cause a page fault. The only way to guard against that would be to check that every CALL instruction has enough stack space which is infeasible.
Welcome to the world of undefined behavior. Anything can happen....
I think this is a common misunderstanding about UB. It's not that anything can happen, just that the standard doesn't specify what happens, meaning whatever happens is compiler/architecture/OS dependent. So you can't depend on UB in portable code. But something definite will happen, given the current state of the system. After all, if it didn't, these things wouldn't be exploitable either.
> But something definite will happen, given the current state of the system.

This is only true in the very loose and more or less useless sense that the compiler is definitely going to emit some machine code. What does that machine code do in the UB case? It might be absolutely anything.

One direction you could go here is you insist that surely the machine code has a defined meaning for all possible machine states, but that's involving a lot of state you aren't aware of as the programmer, and it's certainly nothing you can plan for or anticipate so it's essentially the same thing as "anything can happen".

Another is you could say, no, I'm sure the compiler is obliged to put out specific machine code, and you'd just be wrong about that, Undefined Behaviour is distinct from Unspecified Behaviour or merely Platform Dependant behaviour.

Many C and C++ programmers have the mistaken expectation that if their program is incorrect it can't do anything really crazy, like if I never launch_missiles() surely the program can't just launch_missiles() because I made a tiny mistake that created Undefined Behaviour? Yes, it can, and in some cases it absolutely will do that.

I'm aware you can get some pretty crazy behaviours, say if you end up overwriting a return address and your code begins to jump around like crazy. Even that could reproduce the same behaviour consistently though.

I once had a bug like that in a piece of AVR C code where the stack corruption would happen in the same place every time and the code would pathologically jump to the same places in the same order every time. It's worth noting though that when there's an OS, usually what will happen is just a SIGABRT. See the OpenBSD libc allocator for a masterclass in making misbehaving programs crash.

I was never advocating to rely on UB, btw. But yes, UB can be understood in many cases.

You are confusing the C standard and actual platforms/C implementations. A lot of things are UB in the standard but perfectly well defined on your platform. Standards don’t compile code, real compilers do. The standard doesn’t provide standard library implementations, the actual platform does.

Targeting the standard is nice, but if all of your target platforms guarantee certain behaviors, you might consider using those. A lot of UB in the C standard is perfectly defined and consistent across MSVC, GCC, Clang, and ICC.

> A lot of UB in the C standard is perfectly defined and consistent across MSVC, GCC, Clang, and ICC.

Do you have examples of this "a lot of UB in the C standard" which is in fact guaranteed to be "perfectly defined and consistent" across all the platforms you listed ? You may need to link the guarantees you're relying on.

What you are describing is unspecified and implementation-defined behavior [0].

Avoiding UB (edit: in general) doesn't have anything to do with the code being portable and everything with the code not being buggy [1][2].

[0] https://en.cppreference.com/w/c/language/behavior

[1] https://blog.regehr.org/archives/213

[2] http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

Oh really? Then why does every compiler I use have a parameter to turn off strict aliasing?

You cite to a source that contradicts you. In the llvm blog post: "It is also worth pointing out that both Clang and GCC nail down a few behaviors that the C standard leaves undefined."

Sometimes a compiler gives a guarantee that a particular UB is always handled in a specific way, but you cannot generalize this to all UB.

Added 'in general' to my comment to make this explicit.

What is undefined about a large VLA? It shouldn't be undefined.

According to wikipedia "C11 does not explicitly name a size-limit for VLAs"

The C standard has no mentions of a program stack. This isn’t undefined behavior.
You shouldn't be writing C if you're not a careful coder.
Yeah, right.

https://msrc-blog.microsoft.com/2019/07/16/a-proactive-appro...

https://research.google/pubs/pub46800/

https://support.apple.com/guide/security/memory-safe-iboot-i...

Maybe you could give an helping hand to Microsoft, Apple and Google, they are in need of carefull C coders.

I'm not sure if you intentionally missed my point. Everything in C requires careful usage. VLAs aren't special: they're just yet another feature which must be used carefully, if used at all.

Personally, I don't use them, but I don't find "they're unsafe" to be a convincing reason for why they shouldn't be included in the already-unsafe language. Saying they're unnecessary might be a better reason.

The goal should be to reduce the amount of sharp edges, not increase them even further.
VLAs are unsafe in the worst kind of way as it is not possible to query when it is safe to use them. alloca() at least in theory can return null stack overflow, but there is no such provision with VLA.
They're not unsafe (in the memory sense) as long as they check for overflow and reliably crash if there is one.
If a lot of platforms don't implement this check reliably, then it's unsafe in practice at this time, even if not in theory.
And if you're a careful coder writing C, you should give the VLA the stink eye unless it's proving its worth.
Hint, that means nobody should be writing C.
Where is the lie?
Too bad we have all that legacy C code that won't just reappear by itself on a safer language.

That means there are a lot of not careful enough developers (AKA, human ones) that will write a lot of C just because they need some change here or there.