Hacker News new | ask | show | jobs
by zedshaw 4189 days ago
Alright my friend, here's the gist:

https://gist.github.com/zedshaw/c20a69f17578909523c4

The rules:

1. You said you can make that for-loop run forever that can call it and it'll enter an infinite loop. 2. To prove that, you can only alter the main function, then hand me back the code and I'll compile it and run it on my machines. 3. It has to run without stopping for 24 hours. If you can do that then I'll consider that an "infinite loop". 4. You can't call any more functions than what's in there already. So no fancy hacks to keep the OS from allowing segfaults by putting in signal handlers, linking against other libraries, or anything.

Very curious how you do this. This is fun!

Edit: And, I may not be checking comments so email me to gloat if you figure it out. help@learncodethehardway.org any time. You can also post it here. Just link me the reply so I can go look.

1 comments

Here you go:

    int main(int argc, char *argv[])
    {
        int offset = -63;
        char input[] = { 1, 1, 1 };
        char *output = input + offset;
        safercopy(3, output, 3, input);
        
        return 0;
    }
This is running on a Mac with 10.10.1 and Xcode 6.1.1, compiled without optimizations. The offset value may need to be different on other architectures. With optimizations on, the approach may need to change. Don't give me any guff about the conditions needed, since that's the whole point of undefined behavior: it depends on context that should be irrelevant.

There's no need to run it for 24 hours. Just run it, then pause in the debugger and step through a few loops. It'll be evident that nothing changes.

If you need help getting it to work properly on your own setup, let me know.

Alright, I did finally get this working. Pretty fucking awesome, I had not thought of that. Here's a version everyone can try:

https://gist.github.com/zedshaw/64b3fb6b7ed653852619

I officially concede that because you can work two pointers on a computer to overwrite another location of memory to alter a for-loop (incidentally, there's not UB listed in ANSI for 'alter the variable of a for-loop') that everyone should go back to writing their C code just as K&R intended.

Please, you all should rely on only the '\0' byte terminator of all strings, don't do any bounds checking, don't check the return code of functions, and you will be totally safe.

Because, UB means "I ain't gotta fix it."

Enjoy, now I'm going painting.

Your insistence that undefined behavior is not at play here is bizarre. It is not possible to construct a pointer into another stack frame like my code does without invoking undefined behavior. To wit:

"When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."

This is a long-winded standards-language way of saying that if you compute x+y, where x is an array and y is not a valid index in the array or one past the end of the array, the behavior is undefined. The moment I computed 'output', I hit UB. Everything that happens after that is up to the whims of the platform.

Things like this make me wonder why is anyone using C at all these days. Rust can't come soon enough.
The gulf between what C actually is and how most people assume it is can be scary.
C itself is also scary. Most other languages provide at least run-time safety; some provide great compile-time safety. C providing neither and being the most popular language for system software is what is really scary.

I guess part of what is scary about C is that it gives you the illusion of a high-level language, but unless you know all UB by heart, you might accidentally start working in assembly.

Isn't there at least a flag that activates warnings for stuff like this? I tried -Wall in both clang and gcc and they didn't say jack shit.

What do modern C developers do these days? Arm themselves with expensive advanced static analysis tools to their teeth?

Also everything that happens before it!
With clang 3.5 on x86-64 Linux I also get an infinite loop.

Compiled with GCC, it doesn't go into an infinite loop. Zed seems to think this makes your example a "fail". This is curious since the whole quibble is about functions that work in some context but fail when brought into a different context.

Edit: FWIW, safercopy also has undefined behavior when from_length and to_length are larger than MAX_INT.

It's only curious if you think Zed is honestly trying to figure out the truth, rather that "prove" he's right. That assumption is clearly shown to be wrong by this comment thread.
Fail. I've ran it repeatedly over and over and it doesn't run forever. It segfaults or exits. But....you did cover a corner case I hadn't considered. Thanks!

https://gist.github.com/zedshaw/81edf35857e137ccd7d3 is the results.

Did you adjust the offset like I said you would probably need to?
I can't see any changes in `safercopy` or in the calling code you provided - only the invocations from the shell showing the prog terminating.
My point is that the offset value needed to produce the described behavior depends on various implementation-specific things, so that constant may need to be altered when trying the code on other compilers, OSes, or CPU architectures.