Hacker News new | ask | show | jobs
by lionkor 1164 days ago
If someone checked in that code, it would definitely fail my code review. I understand back in the day it was different, but today there should be a lot of named intermediates. Additionally, `long` and any such keywords should not make it into any commit unless the commit explains 1) why its needed and 2) how, with any standard conforming implementation, it couldnt possibly cause a bug.

As always in C programming, the bugs arise from people doing stuff that any sane guideline tells them to not do.

4 comments

Named intermediates for what? This function munges a single hash value, h, and then folds its bits 24 to 31, g, into bits 4 to 7. It implements a mathematical formula, basically, and given it’s a hash, there isn’t much meaning to the contents of any pair of parens in that formula. Perhaps *name++, but any C programmer just thinks “next character of name” when they see that, don’t they?

(I’ve seen people code with a lot of intermediates with two- and three-word names, and I just don’t see why, in general. But here especially—hashes are not exactly oases of meaning.)

The 'unsigned long' point is correct in a cross-platform context, of course. If the code is from the SunOS linker, then its authors defined the ABI, so could guarantee 'long' was 32 bits. It’s the Glibc port that was careless.

(Although if I heard the phrase “any standard-conforming implementation” in this context, I’d be tempted to point out that an implementation doesn’t have to provide uint32_t at all, though a POSIX 2008 one does, and may have 33-bit ints, automatically promoting the uint32_t to signed int and immediately hitting UB on overflow. Either use C23 _BitInt(32), not subject to promotions for precisely this reason, or add +0U as needed to force the hypothetical signed int to unsigned.)

if its a mathematical formula, a comment would do it, too
For all its advantages, C is unfortunately so ripe with stuff that any sane guideline would recommend not to do that it can hard to follow through.

Though I agree in this case this would never have passed a modern review.

One only needs to compare C programming manuals with the programming manuals from systems programming languages being developed outside Bell Labs.

Also note that C author's were naturally aware of these issues and created lint in 1979.

Now getting people to use such tooling is another matter, apparently 50 years weren't enough.

There is a lovely piece of text in the third version of PNG specification: "PNG four-byte unsigned integers are limited to the range 0 to 2^31-1 to accommodate languages that have difficulty with unsigned four-byte values" [0]. Gee, I wonder what languages those may be?

[0] https://www.w3.org/TR/2022/WD-png-3-20221025/#7Integers-and-...

The answer is Java, not C(++).
C has no problem dealing with a uint32_t. Not sure what you are getting at.

This is more of an issue with languages like java that abstract away integer widths and signs, which is convenient if you're only doing arithmetic but becomes a huge pain when dealing with binary data.

The sized uint sure, I get that.

But the rest of the code? I guess you could make it so the dereference and pointer-increment happen separately. And for someone unfamiliar you could expand out the loop condition.

But ultimately it's a hash function. How would you write it?

Another perspective is that easily preventable bugs in C arise because the compiler doesn’t stop people from doing them.
I don't understand what it is you expect the compiler to stop. There's nothing actually wrong with the code. It's just written in a platform-dependent manner.
The base integer types have always been open ended since C89. It has always been wrong to assume an exact size when writing code that depends on modular wraparound. The code is wrong.

Prior to stdint.h there was no portable way to address this but new code should be written using the type system as intended. Open ended when the minimum is sufficient and larger storage is inconsequential. Exact size when the algorithm requires it.

The code isn't wrong, it's platform dependent. It's the assumption that it that is wrong. The code itself if is fine.
If the assumptions made by the code are wrong, the code is wrong.

I would agree with you if the code would preprocessor-check ULONG_MAX and #error out if it isn’t the expected value.

The code isn't making assumptions, the programmer is.
But if the code was written with intention to be platform-independent, and turned out to be platform-dependent... it's the programmer who wrote it is wrong, not the code itself? Or do I misunderstand you completely?
Right.

Let's say I know the Swedish language, in which "öl" means "beer", and I travel to Germany to partake in the Oktoberfest celebrations. I arrive and in Germany and immediately try to order a beer and try to say "One Beer Please", but not mastering the language I say "Ein Öl Bitte" (meaning "one oil please"). I don't think the that's a bug in the German language, and the sentence itself is perfectly good German. It just doesn't say what I think it does.