Hacker News new | ask | show | jobs
by ghswa 4519 days ago
I'm enjoying this so far although a couple of things have made me scratch my head in the example code allocating a vector.

  int create_vector(struct vector *vc, int size) {

    vc->data = 0;
    vc->size = 0;

    if (vc == NULL) {
      return VECTOR_NULL_ERROR;
    }

    /* check for integer and SIZE_MAX overflow */
    if (size == 0 || size > SIZE_MAX) {
      errno = ENOMEM;
      return VECTOR_SIZE_ERROR;
    }
Accessing the fields of vc before the NULL check seems like an error. Also, would it not be simpler to change type of the size parameter to size_t so that it's the same type as the size field of the vector struct?
6 comments

Yes, by having the NULL check after those members have been accessed, you're telling the compiler that vc cannot be NULL (because accessing the members if vc is NULL would be undefined behavior). The compiler can (and GCC will) eliminate the NULL check completely.
Does gcc make any kind of assumptions about signed-overflow? For example, would it eliminate the comparison in this code:

  int8_t i;

  /* ... */

  i += 1;

  if (i == 0) {
    /* ... */
  }
Yes, it'll exploit the fact that signed overflow is undefined.

    int incr(int i) {
      int old = i;
      i += 1;
      if (i < old) return 1;
      return 0;
    }
GCC optimizes this to 'return 0;'.

EDIT: Updated to a better example.

Would you mind elaborating on that? I don't see this behavior with GCC 4.2.1 on OSX or OpenBSD at -O3 when I look at the disassembly.
Here's the simplest example I could come up with (it's functionally equivalent to the code OP pasted):

    int zero1(int* i) {
      *i = 0;
      return 0;
    }
    
    int zero2(int* i) {
      *i = 0;
      if(i == 0) return 1;
      return 0;
    }
GCC produces the same code for these two functions. If the 'if' in zero2 is moved to the top of the function, it takes effect.

I tried this on GCC 4.8.1. A wonderful site for checking these things is: http://gcc.godbolt.org/

This works. Also it does this without a notice, even when I ask for -Wall.

I found you can control this with -f{no-,}delete-null-pointer-checks. Apparently it's enabled by -O2.

http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Optimize-Options...

Excellent, thank you. Those also compile differently for me, so it probably appeared in a newer version of gcc.
It's an optimization, it shows up at -O2.
Why put the if in at all then?

Regardless of optimisation, it will never be followed - it segfaults if NULL and returns normally if not.

huh :) have you actually tried doing that i.e. call the function with NULL params and see ? you will see a segfault. anyways, rather than rely on vagaries of -O2 implementation on compilers, i would rather be explicit about it and just assert invariants...
I believe you got my comment completely bass ackwards.

I'm not describing some optimization that you, the programmer, can do. I'm talking about what the compiler can (and will) do by assuming your code doesn't have undefined behavior.

In this particular case, the compiler recognizes that if the pointer is NULL, undefined behavior has already been invoked and it can therefore eliminate the check.

I recommend this series of posts on the LLVM blog: http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

> In this particular case, the compiler recognizes that if the pointer is NULL, undefined behavior has already been invoked and it can therefore eliminate the check.

ah, makes sense thank you !

  *you will see a segfault*
No, you will see undefined behavior. Undefined behavior doesn't mean "you get a segfault". That would be defined behavior.
Relying on the compiler is not the way to produce safe code.
We're in violent agreement. I am preaching that you should code to the standard, not implementations. See my reply to 'signa11'.
> Accessing the fields of vc before the NULL check seems like an error. Also, would it not be simpler to change type of the size parameter to size_t so that it's the same type as the size field of the vector struct?

What should be done in case vc is NULL? Currently it almost certainly causes a segfault, but I've seen people insist on returning an error code when a NULL argument is given but that might go unchecked. A segfault is easy to debug, a forgotten error code is not.

    int poke_foo(struct foo *foo) {
        // if you're writing a library, this is asking for trouble...
        if(foo == NULL)
            return -1; // eventually someone forgets to check for this
        foo->bar += 1;
        return 0;
    }
The only sensible thing you can do is add an assert. That will stop the program when the error happens and make it easy to debug.

    int poke_foo(struct foo *foo) {
        assert(foo != NULL); // asseting here...
        foo->bar += 1; // ... is not much better than segfaulting here
        return 0;
    }
But in the end, we're still reliably crashing the program if NULL is passed and it is as easy to debug as a segfault would. The assert may keep your static analysis tool (like Coverity) happy but it won't really make a huge difference.

NULL pointers are generally not very difficult to debug, what is hard is dealing with free()'d pointers and other non-null invalid pointer values.

"easy to debug as a segfault would"

I'm on a weird embedded platform where segfaults are almost impossible to debug; the only way to get information out is a log facility over USB that loses the most recent lines of log when it segfaults and resets :(

Would an assert be any better? I guess you could print out a message to the log and enter an infinite loop if an assertion fails.

Similar problems may haunt you when working in kernel space.

But still, null pointers are easy compared to other invalid pointers that can not be distinguished by their value.

Best practices in C are not as easy to make as in other languages because the environment the code runs in may be anything from a micro controller to an embedded platform to kernel space to modern user space apps. It's not so much about the language as it is about the environment the code runs in.

My actual solution is a combination of careful guesswork and a set of macros that write to a region of memory that is not zeroed at reboot. I use them to mark function entry and exit events; in lieu of an actual stack trace, this lets me progressively narrow down where the problem is.
Writing a dynamic array in C is a good exercise in the amount of details you are spared in languages which provide basic containers.

I recently revisited some example code I wrote many years ago [1], and found some new issues, and there are most likely still some I haven't noticed.

That being said, when you use a title like this article's, it is a good idea to check your example code thoroughly.

[1]: https://github.com/jibsen/scv

Thanks, for pointing out
You suggest using 'calloc' to avoid calculating the size of the allocation (which would be necessary with 'malloc'). This is fine, and you store the number of elements in the 'size' field. But later, in the grow function, you double 'size' and then use it as-is in the call to 'realloc' which expects size to be in bytes, not elements.
That is because to my knowledge we does not have something like "crealloc" to grow allocated memory effectively. Naive implementation of such function will lead always to coping memory from old to new area and effective would far more difficult to implement (thus more error prone IMHO).
Hi, my point is that you have a bug in the code. Your 'newsize' is the number of elements. When you call realloc (which expects a byte size) you have to take this into account and multiply with the element size. What you have in the code:

    newptr = realloc(vc->data, newsize);
should be changed to:

    newptr = realloc(vc->data, newsize * sizeof(int));
It is also impossible for a variable of type int to have a value greater than SIZE_MAX.
On conventional platforms, sure, but there's nothing in the standard stopping int being larger than size_t. It would make plenty of sense on segmented or small-pointer architectures, like x32 (word size 64 bit, pointer size 32 bit).
Yep, that is definitely error