| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ghswa 4519 days ago

I'm enjoying this so far although a couple of things have made me scratch my head in the example code allocating a vector.

  int create_vector(struct vector *vc, int size) {

    vc->data = 0;
    vc->size = 0;

    if (vc == NULL) {
      return VECTOR_NULL_ERROR;
    }

    /* check for integer and SIZE_MAX overflow */
    if (size == 0 || size > SIZE_MAX) {
      errno = ENOMEM;
      return VECTOR_SIZE_ERROR;
    }

Accessing the fields of vc before the NULL check seems like an error. Also, would it not be simpler to change type of the size parameter to size_t so that it's the same type as the size field of the vector struct?

6 comments

vinkelhake 4519 days ago

Yes, by having the NULL check after those members have been accessed, you're telling the compiler that vc cannot be NULL (because accessing the members if vc is NULL would be undefined behavior). The compiler can (and GCC will) eliminate the NULL check completely.

ghswa 4519 days ago

Does gcc make any kind of assumptions about signed-overflow? For example, would it eliminate the comparison in this code:

  int8_t i;

  /* ... */

  i += 1;

  if (i == 0) {
    /* ... */
  }

vinkelhake 4519 days ago

Yes, it'll exploit the fact that signed overflow is undefined.

    int incr(int i) {
      int old = i;
      i += 1;
      if (i < old) return 1;
      return 0;
    }

GCC optimizes this to 'return 0;'.

EDIT: Updated to a better example.

silentbicycle 4519 days ago

Would you mind elaborating on that? I don't see this behavior with GCC 4.2.1 on OSX or OpenBSD at -O3 when I look at the disassembly.

vinkelhake 4519 days ago

Here's the simplest example I could come up with (it's functionally equivalent to the code OP pasted):

    int zero1(int* i) {
      *i = 0;
      return 0;
    }
    
    int zero2(int* i) {
      *i = 0;
      if(i == 0) return 1;
      return 0;
    }

GCC produces the same code for these two functions. If the 'if' in zero2 is moved to the top of the function, it takes effect.

I tried this on GCC 4.8.1. A wonderful site for checking these things is: http://gcc.godbolt.org/

throwaway_yy2Di 4519 days ago

This works. Also it does this without a notice, even when I ask for -Wall.

I found you can control this with -f{no-,}delete-null-pointer-checks. Apparently it's enabled by -O2.

http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Optimize-Options...

silentbicycle 4519 days ago

Excellent, thank you. Those also compile differently for me, so it probably appeared in a newer version of gcc.

throwaway_yy2Di 4519 days ago

It's an optimization, it shows up at -O2.

dkersten 4518 days ago

Why put the if in at all then?

Regardless of optimisation, it will never be followed - it segfaults if NULL and returns normally if not.

signa11 4519 days ago

huh :) have you actually tried doing that i.e. call the function with NULL params and see ? you will see a segfault. anyways, rather than rely on vagaries of -O2 implementation on compilers, i would rather be explicit about it and just assert invariants...

vinkelhake 4519 days ago

I believe you got my comment completely bass ackwards.

I'm not describing some optimization that you, the programmer, can do. I'm talking about what the compiler can (and will) do by assuming your code doesn't have undefined behavior.

In this particular case, the compiler recognizes that if the pointer is NULL, undefined behavior has already been invoked and it can therefore eliminate the check.

I recommend this series of posts on the LLVM blog: http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

signa11 4519 days ago

> In this particular case, the compiler recognizes that if the pointer is NULL, undefined behavior has already been invoked and it can therefore eliminate the check.

ah, makes sense thank you !

EpicEng 4519 days ago

  *you will see a segfault*

No, you will see undefined behavior. Undefined behavior doesn't mean "you get a segfault". That would be defined behavior.

fit2rule 4519 days ago

Relying on the compiler is not the way to produce safe code.

vinkelhake 4518 days ago

We're in violent agreement. I am preaching that you should code to the standard, not implementations. See my reply to 'signa11'.

exDM69 4519 days ago

> Accessing the fields of vc before the NULL check seems like an error. Also, would it not be simpler to change type of the size parameter to size_t so that it's the same type as the size field of the vector struct?

What should be done in case vc is NULL? Currently it almost certainly causes a segfault, but I've seen people insist on returning an error code when a NULL argument is given but that might go unchecked. A segfault is easy to debug, a forgotten error code is not.

    int poke_foo(struct foo *foo) {
        // if you're writing a library, this is asking for trouble...
        if(foo == NULL)
            return -1; // eventually someone forgets to check for this
        foo->bar += 1;
        return 0;
    }

The only sensible thing you can do is add an assert. That will stop the program when the error happens and make it easy to debug.

    int poke_foo(struct foo *foo) {
        assert(foo != NULL); // asseting here...
        foo->bar += 1; // ... is not much better than segfaulting here
        return 0;
    }

But in the end, we're still reliably crashing the program if NULL is passed and it is as easy to debug as a segfault would. The assert may keep your static analysis tool (like Coverity) happy but it won't really make a huge difference.

NULL pointers are generally not very difficult to debug, what is hard is dealing with free()'d pointers and other non-null invalid pointer values.

pjc50 4519 days ago

"easy to debug as a segfault would"

I'm on a weird embedded platform where segfaults are almost impossible to debug; the only way to get information out is a log facility over USB that loses the most recent lines of log when it segfaults and resets :(

exDM69 4519 days ago

Would an assert be any better? I guess you could print out a message to the log and enter an infinite loop if an assertion fails.

Similar problems may haunt you when working in kernel space.

But still, null pointers are easy compared to other invalid pointers that can not be distinguished by their value.

Best practices in C are not as easy to make as in other languages because the environment the code runs in may be anything from a micro controller to an embedded platform to kernel space to modern user space apps. It's not so much about the language as it is about the environment the code runs in.

pjc50 4518 days ago

My actual solution is a combination of careful guesswork and a set of macros that write to a region of memory that is not zeroed at reboot. I use them to mark function entry and exit events; in lieu of an actual stack trace, this lets me progressively narrow down where the problem is.

jibsen 4519 days ago

Writing a dynamic array in C is a good exercise in the amount of details you are spared in languages which provide basic containers.

I recently revisited some example code I wrote many years ago [1], and found some new issues, and there are most likely still some I haven't noticed.

That being said, when you use a title like this article's, it is a good idea to check your example code thoroughly.

[1]: https://github.com/jibsen/scv

pfacka 4519 days ago

Thanks, for pointing out

vinkelhake 4519 days ago

You suggest using 'calloc' to avoid calculating the size of the allocation (which would be necessary with 'malloc'). This is fine, and you store the number of elements in the 'size' field. But later, in the grow function, you double 'size' and then use it as-is in the call to 'realloc' which expects size to be in bytes, not elements.

pfacka 4519 days ago

That is because to my knowledge we does not have something like "crealloc" to grow allocated memory effectively. Naive implementation of such function will lead always to coping memory from old to new area and effective would far more difficult to implement (thus more error prone IMHO).

vinkelhake 4518 days ago

Hi, my point is that you have a bug in the code. Your 'newsize' is the number of elements. When you call realloc (which expects a byte size) you have to take this into account and multiply with the element size. What you have in the code:

    newptr = realloc(vc->data, newsize);

should be changed to:

    newptr = realloc(vc->data, newsize * sizeof(int));

eliteraspberrie 4519 days ago

It is also impossible for a variable of type int to have a value greater than SIZE_MAX.

scatters 4518 days ago

On conventional platforms, sure, but there's nothing in the standard stopping int being larger than size_t. It would make plenty of sense on segmented or small-pointer architectures, like x32 (word size 64 bit, pointer size 32 bit).

pfacka 4519 days ago

Yep, that is definitely error