Hacker News new | ask | show | jobs
by gjulianm 4843 days ago

    #define struct union
    #define else
That's evil. I have to do it in someone's code some day just to have some fun.

But, apart from that, it's a really nice compilation. I didn't know about the compile time checks of array sizes, but I have a doubt. What if I pass to a method declared

    int foo(int x[static 10])
this pointer

    int* x = (int*) calloc(20, sizeof(int));
Does the compiler skip the check? Does it give me a warning?

EDIT: Funnily enough, in Mac it doesn't give any warning, neither for pointers nor for undersized arrays (ie, foo(w[5]) doesn't give a warning). And I've compiled with -std=c99 -pedantic -Wall.

6 comments

Last time this came up on HN, it was thought to be a clang feature.

Edit: While we're talking about dark corners, please stop casting functions that return void * . If your code lacks the declaration of the function, the compiler will assume pre ANSI-C semantics and generate code returning an int.

On machines where pointers do not fit ints (basically all 64bit machines), you just silently (due to the cast there is no warning) truncated a pointer. Worse, it may work depending on the malloc implementation and how much memory you allocate.

We have to fix these kinds of bugs on OpenBSD a lot, please help by typing less and let the compiler warn you about silly mistakes :-)

And yes, C++ fucked this up for C. I'll leave it to Linus to say something nice about that..

> If your code lacks the declaration of the function, the compiler will assume pre ANSI-C semantics and generate code returning an int.

Better, please help by compiling your C code with -Wimplicit-function-declaration (included in -Wall), and fixing all the problems it reports. Then you won't have to worry about this problem, or a bunch of other problems.

All-too-common response: "Oh, but they're just warnings mumble compiles and runs 'cleanly' mumble..."

    -Wall -Wextra -Werror
It's a C99 feature, and clang's the only compiler I know of that produces the diagnostic (and only in later versions)

btw; I wrote this talk.

I didn't know that. To answer my previous question, clang doesn't fire a warning when passing pointers of any size to the function foo.

And by the way, nice talk, it's great learning these dark secrets of C.

The compiler can't know at compile time with a naked pointer like it can with an array. [static 1] is handy to say it must not explicitly be NULL, as if it were optional, however.
Yes, but I expected some kind of "You're passing a pointer as an array of size n. I can't check the size, but you should make sure you've checked it".
I can't imagine that would be anything but noise. 99% of my function calls have pointers passed through, not arrays.
But a pointer is not the same thing as an array, a pointer does not carry the size of the allocated space which an array does in the same scope.
Good, because I was missing a slide:

= Bitfields =

Not even once.

;)

Everybody knows about bitfields :)
Bitfields have some crazy fun dark corners, though. For instance, which values can this bitfield hold:

    int b:1;
Signed two's-complement n-bit integers can hold values from 2^(n-1)-1 to -2^(n-1), so in theory it can hold 0 and -1, but many compilers get that one wrong. Always declare one-bit bitfields as unsigned. (The Sparse static analyzer will warn you about that one.)
In my university they always told me to cast the return of void * functions, I thought it was just to avoid the warning saying "automatic cast of void* to int*" or similar.

Thanks, I will take that into account.

It is true in C++, not in C where void* needs no casting.

People tended to confuse C++ with C more in the past, as they had not diverged as much as they have now.

An annoying thing C++. Anyone know the history/rationale of this change?
It is supposed to provide you more checking by disabling automatic cast from void* to any other pointer. This makes sense in C++ since casting a pointer to a class can trigger some address adjustement if the target class of this instance pointed to is multiple derived (and maybe in other cases?). There is no way such adjustment can happen if the source type is void*, because then you don't know what the source type really is.
As far as I know, this is exactly the rationale, actually. Say you have:

  struct a { int a;}
  struct b { int b;}
  struct c : a, b { };

  c myc;
  c* p1 = &myc;
  b* pb1 = p1; // correctly points to myc's base b, which is offset

  void* p2 = &myc;
  b* pb2 = p2; // Would not point at instance of b, compile error
If I need void*-casting code to compile on both C and C++ compilers, I use a macro like this:

  #ifdef __cplusplus
  #define STATIC_CAST(T, EXPR) static_cast<T>(EXPR)
  #else
  #define STATIC_CAST(T, EXPR) (EXPR)
  #endif
This leaves the conversion to be implicit in C, and uses the stricter static_cast in C++ to catch certain types of likely-unsafe conversions, such as the aforementioned cast from int to pointer.
> While we're talking about dark corners, please stop casting functions that return void *

The problem is, if I want my C code to compile with MSVC, it has to compile as C++ - and even if I abhor Windows for development myself, a lot of developers are using MSVC.

I just wish Microsoft would update their C compiler, at least to C90. But then I suppose the standard has only been around for 23 years, and nobody really uses C anyway.

> The problem is, if I want my C code to compile with MSVC, it has to compile as C++

As someone who has compiled a lot of .c files that do not cast void pointers with cl.exe, I'd say no, this is not true... Maybe what you're trying to say is that your code relies on C99 features that are also present in C++? IMO if that's what's holding you back it's much easier to just write C89, maybe with the occasional ifdef, than to suddenly write some crummy C/C++ hybrid. That or just use mingw. (Unless you're doing SEH. I'm not aware of a good way to do SEH with mingw.)

You're suggesting I write C89 in 2013 just to support MSVC? C89 in which you can't even mix data and code?

I think casting void pointers is very much the lesser of two evils. And of course I can use MinGW, but that doesn't mean the majority of Windows developers don't insist on using MSVC.

C++ is not a superset of C[0]. If you want to use MSVC and newer features, then you're not writing C anymore; you're just writing an awkward C++ program.

[0] http://en.wikipedia.org/wiki/Compatibility_of_C_and_C%2B%2B

Nope, I'm definitely writing C99, but it just so happens that MSVC is happy enough pretending it's C++.

Definitely writing C99 in that I compile the exact same source files with -std=c99 -Wall -pedantic when using a real C compiler.

> C89 in which you can't even mix data and code?

I don't think this is as big a deal as you're making it out to be. Maybe I'm just used to it, but this is one area where I consider the C89 way to be better on stylistic grounds. (I wrote about this on a stack exchange site some time ago: http://programmers.stackexchange.com/questions/75039/why-dec...)

Realistically I don't think C99 is all that revolutionary, and the fact is there are plenty of people writing C89 in 2013. Let's look at some things C99 adds over C89:

* Mixed declarations and code

IMO not a huge deal, for reasons I give above.

* VLAs

A nice feature, but people have been using malloc for similar purposes for ages, so it's hard to say it's any better than "nice to have". I'd also argue that something that leads me to potentially consume arbitrary amounts of stack space based on runtime decisions is probably not universally good, and I'm pretty cautious about using VLAs even when given the choice.

* Last field of a struct can be variable sized array, eg. struct foo { int num_data; int data[]; };

I like this pattern and have used it a lot. Including with VC++ by having a zero-sized array at the end of the struct. This is once place where I use an ifdef to bridge the gaps because recently I noticed LLVM started doing weird things for this if you don't do it the "legit" C99 way.

* Cosmetic issues like // comments

VC++ already supports this in C89 files as a non-portable extension.

* Portable typedefs like uint32_t, etc.

These are also pretty handy. But if you're building with VC++ you're probably using the non-portable Microsoft ones anyway. (I think VC++ 2010 added the C99 versions too.)

* bool type

Again, every platform and even reasonably sized library works around this with a typedef and some macros.

* Slightly different behaviors for some libc functions

I notice this most for snprintf, where the pre-C99 return value [followed by Microsoft] means something different [and better] in C99. Again there are non-portable solutions for this that you can use ifdefs for.

* Named initializers for struct members

Cool feature. I'd hardly say I couldn't live without it, though. The only place I've really seen it used extensively is in the Linux kernel.

* Stuff that no one uses, such as "static" as used in these slides, or complex numbers, or whatever.

Not a problem since these don't see wide use.

Mixed declarations and code are absolutely essential in a modern C programming style. It's all very well to declare things like "your functions should be shorter" or "you shouldn't use temporary variables in macros", but it's just not practical in real world code. In fact, it's so impractical they amended the spec 14 years ago.

Then there are things like the portable integer types - which you decided aren't necessary because everyone is writing Win32 code (???), the designated initializers which I use every day to make my struct initializing code safe from changes in the struct itself, for loops, a consistent bool type ... heck, I could go on all day if you hadn't already dismissed the rest of the standard as "stuff that no one uses".

I really don't mean to come across as aggressive, but this kind of ignorant and dismissive attitude is so rampant in the tech scene at the moment. And it's getting us nowhere.

It's better to

typedef void ( * fptr_t)()

and cast to fptr_t instead of void * But it's bigger problem how to force users to cast it back to proper prototype, because casting it back to something else will give UB.

"On machines where pointers do not fit ints (basically all 64bit machines), you just silently (due to the cast there is no warning) truncated a pointer. Worse, it may work depending on the malloc implementation and how much memory you allocate."

Wow! Ugly! Scary! Another good reason to know in fine details just what cast does. At one point, my Visual Basic .NET code actually calls some old C code, and in time I will need to convert to 64 bit addressing. So, I will keep in mind that with 64 bits I have to be especially careful about pointers and C.

Well, the bug only appears if you didn't include the header file declaring malloc (or any other void*-returning function). If you did, there's no problem.

It's a pretty easy thing to leave out, missing a required header, but if you always compile under -Wall it'll catch this and many other problems as well.

Thanks for the details, but I remain concerned about taking a language, clearly with some tricky aspects, so closely identified with 16 and 32 bit computing into 64 bit computing.
It seems to me that the problem is that you're using an undefined function (when there are flags to stop this), not casting.
It's impossible to have a static decision procedure about dynamic properties of programs, such as the size of dynamically allocated memory areas (Rice's theorem). So, it is necessary to either include false positives (correct programs rejected) or false negatives (incorrect programs accepted).

Sound static analyzers fall in the first case, but require a lot of work to become precise enough to be used (ie, to reduce the number of false alarms). Compilers fall in the second case in the sense that they don't have to honor such a clause. And in the C99 norm it's actually a "shall" (it just couldn't honor a "must" in that case):

"If the keyword static also appears within the [ and ] of the array type derivation, then for each call to the function, the value of the corresponding actual argument shall provide access to the first element of an array with at least as many elements as specified by the size expression."

> It's impossible to have a static decision procedure about dynamic properties of programs, such as the size of dynamically allocated memory areas (Rice's theorem).

You can not have a general procedure, but with the help of the programmer / user of the compiler, you can prove all kinds of things.

The purpose of

  int foo(int x[static 10])
is not to produce a warning - that's just a nice possible side-effect (and only in some cases).

The real purpose is to allow the compiler to optimise the compilation of the foo() function itself, under the assumption that x will always point to the first element of an array of at least 10 elements.

What possible optimization would a compiler be able to do, given that arrays are only ever implicit in compiled C?
If it knows the address is valid, it can use a speculative load. If it knows there are enough entries, it can use a wide load. Without that knowledge, such instructions could trigger SEGV due to an invalid pointer or the wide load spilling over a page boundary.
At least or exactly 10 elements?
At least
Huh, I didn't know that. I just checked the standard and you appear to be correct, so, just for the record:

"... if the keyword static also appears within the [ and ] of the array type derivation, then for each call to the function, the value of the corresponding actual argument shall provide access to the first element of an array with at least as many elements as specified by the size expression."

About evil preprocessor construct, there is a nice collection in the comment of this Jonh Regehr post : http://blog.regehr.org/archives/574
Isn't re-#defining language keywords disallowed by the C standard? It's just that most compilers don't complain about it.
The C++ standard has a clause prohibiting macros that re-#define keywords if the translation unit also #includes a standard header. I guess this is to clarify whether the standard-library functionality is expected to still work even in the face of such a #define, by specifying that implementors don't need to worry about that situation.

I don't believe C has any such restrictions, though.

The pre-processor is independent and a macro expanded before the C compiler sees it.
The preprocessor is a part of the standardized translation process, and if the standard says that certain things are not allowed in a well-formed C program, it does not matter at which stage the compiler is.
I'd be glad if you can pinpoint the location in the C standard where this constraint is given.
C99 Section 7.1.2.4 (on standard headers): "The program shall not have any macros with names lexically identical to keywords currently defined prior to the inclusion."

You can redefine keywords in your code, but they must not be defined when including standard headers (for obvious reasons).

Could someone explain to us newbies what the #define code does?...
It replaces the first word with what comes after. For example, #define foo bar is roughly equivalent to s/foo/bar/g, so #define else would just remove all else keywords. I'll let you guess what it does to a program.
Not to forget #define struct union which will pack all struct members together. Gives me chills even to think about it...
Can anyone think of a reason why all the examples are keywords? I would think if you were trying to cause some trouble, redefining standard functions would cause all kinds of chaos, e.g.

  #define memmove memcpy
  #define strncmp memcmp
  #define rand fork
  #define free(x)
  #define strerror(n) strerror((n)+1)
  #define memset(addr, byte, len) memset(addr, ~(byte), len)
The #define else one is scarier to me. #define struct union will almost certainly crash immediately on runtime, as one member's a pointer and it gets overwritten by an int or a pointer to a different type. #define else just always runs all else clauses - the code is likely still valid, it just does something very different from what the author intended.

Both of them would be a real pain to debug.

Both would be easy to notice in a debugger. Finding the cause, as you imply, would be tricky unless you remember to check the preprocessed output.
It's a great memory saver.