Hacker News new | ask | show | jobs
by option_key 1415 days ago
>Thanks to them being yet another attack vector and funny stuff like on this post, got demoted to optional on C11.

Sadly, the C committee doesn't really understand what was wrong with VLAs and a sizable group of its members wants to make them mandatory again:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2921.pdf ("Does WG14 want to make VLAs fully mandatory in C23")

3 comments

What's wrong with VLAs is their syntax. It really shouldn't use the same syntax as regular C arrays, otherwise they would be fine, maybe with a scary enough keyword. They are more generic than alloca too, alloca being scoped to the function, while VLAs being scoped to the innermost block scope that contains them.
Syntax, no protection against stack corruption,...
You can corrupt the stack without VLAs just fine. What else?
VLAs make it a lot easier to corrupt the stack by accident. Unless you're quite a careful coder, stuff like:

  f (size_t n)
  {
    char str[n];
leads to a possible exploit where the input is manipulated so n is large, causing a DoS attack (at best) or full exploit at worse. I'm not saying that banning VLAs solves every problem though.

However the main reason we forbid VLAs in all our code is because thread stacks (particularly on 32 bit or in kernel) are quite limited in depth and so you want to be careful with stack frame size. VLAs make it harder to compute and thus check stack frame sizes at compile time, making the -Wstack-usage warning less effective. Large arrays get allocated on the heap instead.

> stuff like ... leads to a possible exploit where the input is manipulated so n is large

The same is true for most recursive calls, should recursion be also banned in programming languages?

When writing secure C? In most cases, absolutely.
That's not really a fair comparison though. Recursion is strictly necessary to implement several algorithms. Even if "banned" from the language, you would have to simulate it using a heap allocated stack or something to do certain things.

None of this applies to VLA arguments.

MISRA C bans recursion for instance.
Doesn't a similar DoS risk (from allowing users to allocate arbitrarily large amounts of memory) also apply to the heap? You shouldn't be giving arbitrary user-supplied ints to malloc either.
> Doesn't a similar DoS risk (from allowing users to allocate arbitrarily large amounts of memory) also apply to the heap?

DoS Risk? No one cares too much about that - the problem with VLAs is stack smashing, which then allows aribtrary user-supplied code to be executed.

You cannot do that with malloc() and friends.

How does a huge VLA corrupt the stack? If there's not enough space but code keeps going then isn't that a massive bug with your compiler or runtime?
Okay. How do you tell the kernel that? Sure, the kernel will have put a guard page or more at the end of the stack, so that if you regularly push onto the stack, you will eventually hit a guard page and things will blow up appropriately.

But what if the length of your variable length array is, say, gigabytes, you've blown way past the guard pages, and your pointer is now in non-stack kernel land.

You'd have to check the stack pointer all the time to be sure, that's prohibitive performance-wise. Ironically, x86 kind of had that in hardware back when segmentation was still used.

Welcome to the world of undefined behavior. Anything can happen....
You shouldn't be writing C if you're not a careful coder.
Yeah, right.

https://msrc-blog.microsoft.com/2019/07/16/a-proactive-appro...

https://research.google/pubs/pub46800/

https://support.apple.com/guide/security/memory-safe-iboot-i...

Maybe you could give an helping hand to Microsoft, Apple and Google, they are in need of carefull C coders.

And if you're a careful coder writing C, you should give the VLA the stink eye unless it's proving its worth.
Hint, that means nobody should be writing C.
Too bad we have all that legacy C code that won't just reappear by itself on a safer language.

That means there are a lot of not careful enough developers (AKA, human ones) that will write a lot of C just because they need some change here or there.

With VLAs:

1. The stack-smashing pattern is simple, straightforward and sure to be used often. Other ways to smash the stack require some more "effort"...

2. It's not just _you_ who can smash the stack. It's the fact that anyone who calls your function will smash the stack if they pass some large numeric value.

They can overflow the stack. They cannot smash the stack.
Fair enough; I had the mistaken idea that the two terms are interchangeable, but apparently stack smashing is only used for the attack involving the stack:

https://en.wikipedia.org/wiki/Stack_buffer_overflow

so, pretend I said "overflow" instead of "smash" in my post.

Useless semantic pedantry at best, but arguable wrong as there isn't some sort of ISO standard on dumb hacking terms.
What about not adding even more ways how we should avoid using C?
> What about not adding even more ways how we should avoid using C?

That's a mute point for C's target audience because they already understand that they need to be mindful of what the language does.

What the heck. It's "moot", not "mute".
That is like saying if sushi knifes are already sharp enough, there is no issue cutting fish with a samurai sword instead, except at least with the knife maybe the damage isn't as bad.
VLAs are no more unsafe than standard C is for stack corruption.
Just one additional attack vector more to add to the list, who's still counting them?
It’s not an additional attack vector.

    int A[100000000];
Also has no protection.
the only result of banning VLAs is to force everyone to use alloca, which is even less safe.

exhibit A: https://lists.freedesktop.org/archives/mesa-commit/2020-Dece...

exhibit B: https://github.com/neovim/neovim/issues/5229

exhibit C: https://github.com/sailfishos-mirror/llvm-project/commit/6be...

etc etc

Nobody is forced to use alloca, which is not less safe, only equally disastrous. Just use malloc, already.
ah yes, why didn't I think of it, let me just try:

    #include <cstdlib>
    #include <span>
    
    __attribute__((annotate("realtime")))
    void process_floats(std::span<float> vec) 
    {
      auto filter = (float*) malloc(sizeof(float) * vec.size());
    
      /* fill filter with values */
    
      for(int i = 0; i < vec.size(); i++)
        vec[i] *= filter[i];
    
      free(filter);
    }

    $ stoat-compile++ -c foo.cpp -emit-llvm -std=c++20
    $ stoat foo.bc
    Parsing 'foo.bc'...

    Error #1:
    process_floats(std::span<float, 18446744073709551615ul>) _Z14process_floatsSt4spanIfLm18446744073709551615EE
    ##The Deduction Chain:
    ##The Contradiction Reasons:
     - malloc : NonRealtime (Blacklist)
     - free : NonRealtime (Blacklist)

oh noes :((
Here's a nickel, kid.

The bullshit about oh my embedded systems doesn't have dynamic memory is bullshit. You either know how big your stack is and how many elements there are, and you make the array that big. Or you don't know and you're fucked.

You can't clever your way out of not knowing how big to make the array with magic stack fairy pretend dynamic memory. You can only fuck up. Is there room for 16 elements? The array is 16. Is there room for 32? It's 32.

I think the parent comment was about malloc not being real-time? Not about storage space.

Though I do wonder why there can't be a form of malloc that allocates in a stack like fashion in real time to satisfy the formal verifier?

Real time also generally means your input sizes are bounded and known, otherwise the algorithm itself isn't realtime and malloc isn't the reason why.

But strictly speaking the only problem is a malloc/free that can lock (you can end up with priority inversion). So a lock-free malloc would be realtime just fine, it doesn't have to be stack growth only.

> Though I do wonder why there can't be a form of malloc that allocates in a stack like fashion in real time

I think that's basically what the LLVM SafeStack pass does -- stack variables that might be written out of bounds are moved to a separate stack so they can't smash the return address.

how would you implement that in the face of multiple threads? you can't use TLS as it will have initialize your stack on first access of your malloc_stack in a given thread, which may or may not be safe to use in real-time-ish-contexts (I think it's definitely not on Windows, not sure on Linux)
> You either know how big your stack is

that's in most systems I target a run-time property, not a compile-time one

> wants to make them mandatory again

What does 'mandatory' mean? Like if I write a C compiler without them... what are they going to do about it?

Code that complies with the standard will be rejected by your compiler. The effect would probably be that few people would use your compiler.
Most mainstream compilers aim to be standards compliant