Hacker News new | ask | show | jobs
by notacoward 4221 days ago
Mostly good advice, sometimes even great, but the part about typedefs is total BS. Any non-trivial program will use values that have clearly different meanings but end up being the same C integer type. One's an index, one's a length, one's a repeat count, one's an enumerated value ("enum" was added to the language to support this very usage), and so on. It's stupid that C compilers don't distinguish between any two types that are the same width and signedness; why compound that stupidity? Both humans and static analyzers could tell the difference if you used typedefs, and avoid quite a few bugs as a result. Being able to change one type easily might also make some future maintainer's life much better. There's practically no downside except for having to look up the type in some situations (to see what printf format specifier to use), but that's a trivial problem compared to those that can result from not using a typedef.

Don't want to use typedefs? I think that's a missed opportunity, but OK. Don't use them. OTOH, anyone who tries to pretend that the bad outweighs the good, or discourage others from using them, is an ass. Such advice for the kernel is even hypocritical, when that code uses size_t and off_t and many others quite liberally.

6 comments

Most of that section is concerned with hiding structs or pointers as typedefs: "In general, a pointer, or a struct that has elements that can reasonably be directly accessed should _never_ be a typedef."

Say you are reading a function, and see a local variable declared: "something_t variable_name;". Is it a struct, a pointer, or a basic type? Now compare with "struct something * variable_name;", which is clearly a pointer. If on the other hand it is "struct something variable_name;", you know that it's a struct allocated on the very small kernel stack (less than 8KiB per thread) - something which wouldn't be as clear if the fact that it's a struct were hidden by a typedef.

There are three main reasons to use typedefs: to allow for changes to the underlying type; to add new information to the underlying type (which is item (c) in that section); and to hide information. Since the Linux kernel runs in a constrained environment (as I mentioned, the kernel stack is severely limited, among other things), hiding information without a good reason is frowned upon. It's the same reason they use C instead of C++; the C++ language idioms hide more information.

> Both humans and static analyzers could tell the difference if you used typedefs, and avoid quite a few bugs as a result.

The Linux kernel does that! As I mentioned, it's item (c): "when you use sparse to literally create a _new_ type for type-checking." See for instance the declaration of gfp_t:

  typedef unsigned __bitwise__ gfp_t;
The __bitwise__ is for the Sparse static checker. There are other similar typedefs, like __le32 which holds a little-endian value; the Sparse checker will warn you if used incorrectly (without converting to "native" endian).
> it's item (c): "when you use sparse to literally create a > _new_ type for type-checking."

The problem is that this is presented as an exception that must be (strongly) justified. I think that using typedefs for integer types should be acceptable by default, and there should be specific rules for when to avoid them. The burden of proof is being put on the wrong side.

Even for structs, the argument for typedefs is stronger than the argument against. Even across a purely internal API, the caller often doesn't need to know whether something is an integer, a pointer to a struct, a pointer to a union, a pointer to another pointer, or whatever. Therefore they shouldn't need to know in order to write a declaration, which will become stale and need to be changed if the API ever changes. This is basic information hiding, as known since the 60s. Exposing too much reduces modularity and future flexibility. I've been working on kernels for longer than Linus, and the principle still applies there.

Again, it comes down to defaults and burden of proof. The rule should be to forego struct typedefs only if every user provably needs to know that it's a struct and what's in it (which is often a sign of a bad API). Even then, adding a typedef hardly hurts; anyone who needs to know that a "foo_t" is a "struct foo" and can't figure it out in seconds shouldn't be programming in the kernel or anywhere else.

the caller often doesn't need to know whether something is an integer, a pointer to a struct, a pointer to a union, a pointer to another pointer, or whatever.

The parent provided a very good example: a structure takes up a lot more space than a single int/pointer type, and passing them by value is usually an unnecessary copy.

and need to be changed if the API ever changes.

If the API changes then changing the declarations is likely to be trivial in comparison to the other changes that would need to be made to all the code using it.

Exposing too much reduces modularity and future flexibility.

...and exposing too little reduces understanding of the details, which I think is far more important especially for a kernel.

> a structure takes up a lot more space than a single int/pointer type

Not necessarily. Many structures, especially those used to make up for the lack of tuples/lists in C, are very small. The real difference is between large and small objects. Knowing which is which is part of the essential discipline of being a kernel (or embedded) programmer, and is hardly affected by whether or not typedefs are used.

> changing the declarations is likely to be trivial in comparison

That's generally true of pointer typedefs, which is why I don't particularly care for them and said so in another sub-thread. I think it's much less likely to be true for integer/enum or struct/union typedefs. For example, in the integer/enum case, the most common scenario is a change to a parameter's real type without changing its width or sign. The compiler won't flag that, even though it can cause real problems. Giving the compiler more information should be encouraged, not discouraged, even if there are exceptions either way.

> more important especially for a kernel.

Why do people persist in this belief that a kernel is some mystical realm where software-engineering principles don't apply? Being able to know is not the same as being forced to know. Kernel programmers are already more burdened than others with concerns that they need to think about for every line. Forcing more at them when it's not necessary doesn't help anyone. If you need to know whether something's a pointer to a struct or an array of something even though you never dereference into either (maybe you just pass it back or onward to another function), then somebody's wasting your precious time. Believe me, I know all about the tighter resource constraints for kernel code. OTOH, the people who worked on the AIX and Solaris kernels still knew and applied this stuff. They didn't have the anti-CS attitude that seems rampant among Linux kernel devs, and IMO they were better for that. If an RTOS for tiny devices can have decent modularity - and I've seen some that do - then why can't a full-blown kernel?

Why do people persist in this belief that a kernel is some mystical realm where software-engineering principles don't apply?

Why do you think that these "software-engineering principles" should apply? I think the fact that the Linux kernel works, and it works quite well, is strong enough evidence that they don't matter.

the people who worked on the AIX and Solaris kernels still knew and applied this stuff.

I don't know about AIX, but there's a reason Solaris has been called "Slowlaris"...

If an RTOS for tiny devices can have decent modularity

But is that modularity actually necessary? I've worked with plenty of overly complex applications that were far more inefficient and harder to understand as a whole than they could be, and most of them were the result of dogmatic adherence to principles of modularity, encapsulation, extensibility, etc. (none of which actually improved anything from the point of view of either the users nor the ones trying to figure out how everything works), so maybe that "anti-CS attitude" is a good thing after all...

Usually when people say stuff like this, they haven't programmed anything as complex and performant as the thing they are criticizing, so the comments can and should be disregarded as noise.
Nice ad hominem you've got there. Was it directed at me, or my interlocutors? If it was directed at me, it's not only a fallacy but based on a false premise, as I've worked on seven UNIX kernels plus NT since 1989. That includes HA systems, FT systems, supercomputers, etc. so I don't think one can reasonably say I haven't dealt with some complexity before. Maybe we should delve into your experience to see if you know what you're talking about . . . but no, that would be just as fallacious.

There have been some good comments in this thread, but the only "noise" is from those who haven't even tried to present an argument one way or the other. I get that some people would draw the lines between good vs. bad use of typedefs differently than I would. I'm OK with that, as long as there's some kind of rational decision process behind it. The problem is that often there doesn't seem to be. Aesthetic concerns or the trivial difficulty of getting from the typedef to the underlying type do not, in my opinion, stand against the proven benefits of modularity or robust type checking.

"Why do people persist in this belief that a kernel is some mystical realm where software-engineering principles don't apply?"

It's not that it's a "mystical realm where software-engineering principles don't apply", it's that - like embedded - it's a domain where you often face tighter resource constraints. Applying the same engineering principles with different constraints can lead to different trade-offs, and ultimately different best practices.

You're not understanding their idea behind typedef. IMO, their lines for usage are extremely good when adhered too.

The note about integers is worth complaining a bit about, I agree there is merit to typedef'ing integers in some situations, but the Kernel standard agrees with that in those instances (And the example is bad, there are instances of 'flag' typedefs in the kernel). In general the note about integers is just to discourage spamming typedef's everywhere.

More importantly then integers though, their note about making opaque objects with a typedef is extremely good practice, as it makes it easy to distinguish when it is or isn't expected that you'll be accessing the members directly.

The point of those rules are to allow typedef to actually be useful and communicate some information. If you just allow typedef'ing everything in every situation, then whether or not something is typedef'd becomes useless information to the reader.

"Such advice for the kernel is even hypocritical, when that code uses size_t and off_t and many others quite liberally"

Did you even read their explanation. Apparently not.

This is an acceptable use of typedefs, as explained there, exactly because a size_t varies between architectures.

That's just rationalization. It's basically saying that some typedefs are OK because Linus is used to them, but he doesn't want to take the few seconds to figure out any new ones. The cases for typedefs shouldn't be treated as exceptions. The cases against them should.
"some typedefs are OK because Linus is used to them, but he doesn't want to take the few seconds to figure out any new ones."

NO

Because the wrong uses are more numerous than the right ones. It's that simple

Creating typedefs for integers is mostly useless and causes confusion, except in the cases specified.

Of course, if you work with a small project it's easier than with a big project like the kernel.

And of course I admire Linus for cutting through BS and usually avoiding it.

I've had luck using single-element structs to distinguish between types of data when I'm throwing a lot of primitive types around. In my test with gcc, the generated code was identical to using the primitives directly, although the standard doesn't actually guarantee that and it's historically not been the case in some particular compilers (not sure which).
As a recreational C programmer I have the same impression. I've always used typedefs for structs and enums in my code and I think it makes it more readable and easier to work on. My reaction to reading the kernel style guidelines was a surprise and I am happy I am not the only one disagreeing.
As a "professional" C/C++ programmer (day job), I don't have strong feelings either way. It is frustrating not knowing what the type of a variable is. Am I being passed a pointer, or an integer, or a floating point value, or a whole struct, or what? This really matters! Digging up the definition isn't difficult, but isn't easy, either. I would lean against using typedefs liberally, but I don't feel strongly about it.

Personally I use typedefs as a shortcut. Rather than type boost::shared_ptr<const MyFavoriteClass> over and over, typedef it to ConstMyFavoriteClassPtr for convenience. Then be consistent with that paradigm through the whole project, so you only have to learn it once to know any given Ptr type.

Right, I imagine it's completely different in a project a lot of people are contributing to and 1-2 people thing where I am familiar with the whole codebase and define the typedefs myself.
I agree with you. I think structs help readability specially we using function pointers within structures. Would it be out of line to suggest a new naming convention for struct typedefs and pointer typedefs i.e _t for typedegs and _tp for typedeg pointers
While I generally think typedefs for integer/enumerated types and structs/unions are a good idea, I also don't think the arguments I've made apply as much to typedefs for pointers. The difference between an X and a pointer/reference to X is often an explicit part of the contract between modules or functions. If that contract ever changes, the declarations and usage should change in ways beyond replacement of an identifier. That's different than if X itself changes, which usually can and should be transparent. You also get the same type checking for an "X pointer" declaration (avoiding star because of HN mis-formatting issues) as for its "X_ptr" equivalent. Even compilers will flag "pointer to wrong type" errors, even as they remain oblivious to many "wrong integer type" errors. In short, "X_ptr" typedefs don't help anywhere that "X" typedefs don't already.

I'm not going to argue against pointer typedefs, though I personally don't use them. I'm just saying that I can't make a strong argument for them as I believe I can for other cases.

The argument is that you don't need to resort to naming conventions since the language already supports differentiating them with the struct and the * markings. It's one of the things I fully support. I hate working on code with a billion typedefs for every struct.