| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ycmbntrthrwaway 3461 days ago

The result you get with this trick is signed, while the result you get with sizeof is unsigned.

Edit: Just to clarify, what you get is ptrdiff_t instead of size_t. So if array size is greater than PTRDIFF_MAX, you get undefined behavior [1].

[1] http://en.cppreference.com/w/c/types/ptrdiff_t

3 comments

tedunangst 3461 days ago

As far as I know, every compiler is badly broken with arrays greater than SIZE_MAX / 2, so this would be the least of your troubles.

link

pascal_cuoq 3460 days ago

Since I have it at hand, here is a list of examples of how compilers are broken if an array is larger than SIZE_MAX / 2 (which is called PTRDIFF_MAX in the post):

http://trust-in-soft.com/objects-larger-than-ptrdiff_max-byt...

link

mannykannot 3460 days ago

Is there a circle in hell reserved for C standards committee members who add to the number of cases where 'undefined behavior' occurs in the standards?

link

microtherion 3460 days ago

Just how would you make this case defined? The alternatives to undefined behavior tend to be (1) being silent about the issue and letting users and implementers find out themselves (2) defining the behavior in a way that makes it difficult to implement on some architectures or (3) defining the behavior in a way that imposes costs on all architectures. None of these is particularly attractive.

link

mannykannot 3460 days ago

I would go with option 1 because it does not suggest the problem is solved. As I write this, I realize that there is a counter-argument in that introducing ptrdiff_t gives you a place to issue a warning.

link

kbart 3460 days ago

Read "undefined behavior" as "depending on architecture and compiler". It's not like anything can happen, but it's simply not to describe every architecture and every compiler into a standard. Sure, somebody is free to write an implementation where a nuke is launched every time "undefined behavior" is encountered, and they would be right according to C standard, but in real world, you pretty much know what to expect on a given system.

link

DougBTX 3460 days ago

> "depending on architecture and compiler".

The standard also has the idea of "implementation defined" behaviour, which is close to the definition above. "Undefined behaviour" is a trickier beast, since compilers can rightly assume undefined behaviour never occurs, and optimise accordingly.

link

noselasd 3460 days ago

Not at all - that would be implementation defined behavior.

These days, compilers quite often speculate on undefined behavior, generating code as if the undefined part cannot happen - the result is that your code is going to do stuff you pretty much can not know or expect.

link

Asooka 3460 days ago

I would really love if undefined behaviour meant defined by architecture, but current compiler writers are firmly in the launch nukes camp.

link

michaelmior 3460 days ago

I hope not. One of the reasons for allowing undefined behaviour is to allow room for compilers to perform certain optimizations that may not be possible if a specific result were required.

link

flamedoge 3461 days ago

How likely do you run into array bigger than 2gb?

link

quotemstr 3461 days ago

The "how likely is it, really?" response to questions of technical correctness has always bothered me. It takes a mindset completely alien to mine to say "Here's a race condition. Sure, it's undefined behavior, but the race is narrow, so it's rare" or to say "Sure, memory allocation can theoretically fail, but in practice almost never does" or to say "fsync is too slow and most computers have batteries these days".

Software is unreliable enough as it is due to problems beneath our notice. It seems reckless to avoid fixing problems that we do notice. Sure, you could argue that rare problems are rare and that users probably won't notice them --- this attitude is penny-wise and pound-foolish, because you can't meaningfully reason about a system that's only probably correct.

link

Swizec 3461 days ago

Engineering is about tradeoffs. How many once-in-a-thousand bugs do you fix before you tackle the one-in-a-million? Or one in a billion? What about if it takes $10/bug to fix every 1:1000 bug and $100,000 to fix one 1:1000000 bug?

Correctness is great in theory, but in practice it's a matter of what's important.

link

mannykannot 3460 days ago

If you are only looking at probability and cost-to-fix you are overlooking something important - the cost if/when it happens.

link

verdax_1 3460 days ago

This is really emphasized in things like dmfea and other failure mode analysis documents or regulated industry. They want you to document the likelihood, your ability to recover from the failure, as well as the cost o the failure. You can say that you didn't want to pay for someone fixing some unlikely fail mode but that's small consultation to the people whose lives your product is ruining.

link

mjburgess 3461 days ago

The problem you're latching on to I think is how the context for caculating a probability can vary.

If it were really as likely as, say, the sun exploding that X happened then it would be of no use to expend time on X.

BUT very often people speaking about the probability of events given suspicious constraints. While a memory allocation might not fail in most situations it will fail often in some situations. And a one-in-a-million chance is almost guaranteed when there are millions of uses.

link

verdax_1 3460 days ago

Also worth considering that our processors are handling billions of ops per second. One in a million might be happening all the time even for one user.

link

spdionis 3461 days ago

That's why it's called one in a million...

link

maxlybbert 3460 days ago

https://blogs.msdn.microsoft.com/larryosterman/2004/03/30/on...

link

tholmes 3460 days ago

One in a million happens quite often if you're processing something like ~100k requests a second.

link

spdionis 3460 days ago

In fact I agreed with the parent and just posted a tongue in cheek remark.

One in a million literally means that at ~100k requests a second it will happen once every 10 seconds.

link

jschwartzi 3460 days ago

But it's extremely unlikely when you're processing 10 requests a week, such as might be the case for the web server in a consumer-grade router.

link

chongli 3460 days ago

One in a million isn't just a typical statement of probability, it's a colloquialism used to refer to things that never happen in practice. It's highly misleading to use in the context of computers which, due to their natures, have one in a million events occurring constantly.

link

spdionis 3460 days ago

My comment was tongue in cheek.

link

Dylan16807 3460 days ago

>he "how likely is it, really?" response to questions of technical correctness has always bothered me.

But the question is important in another context: language design. Why is this undefined behavior something that exists in the first place? Objects larger than PTRDIFF_MAX could just not be allowed! This avoids the problem and makes code easier to reason about, with pretty much no downside.

link

verdax_1 3460 days ago

I like the way you're thinking, but that sort of thing probably doesn't get past a committee. "Hey we might not be able to think of an application but that doesn't mean our users won't have a legitimate reason for doing it ... Motion passed."

link

cperciva 3461 days ago

A few months ago I was doing FFTs on arrays larger than 4GB. Amusingly, this uncovered a bug in the LLVM optimizer: It was looking at stride lengths to figure out if accesses were independent, and truncated a 4GB stride down to 0.

link

ksml 3460 days ago

I would be extremely interested to hear how you found this bug. Sounds like a difficult bug to track down, and I always learn from good debugging stories.

link

cperciva 3460 days ago

It was pretty easy to track down: clang38 was exiting with

    Assertion failed: (Distance > 0 && "The distance must be non-zero"),
    function areStridedAccessesIndependent, file /wrkdirs/usr/ports/devel/
    llvm38/work/llvm-3.8.0.src/lib/Analysis/LoopAccessAnalysis.cpp, line 1004.

Looking at the file it was easy to see what was being asserted, and to see that the type was a 32-bit integer; since I knew I was dealing with huge FFTs, the problem was obvious.

Let this be a lesson: Asserting that impossible things don't happen makes debugging much easier when they do happen!

link

flamedoge 3459 days ago

Well good thing it was ReleaseWithAsserts build!

link

vram22 3461 days ago

Not likely, but possible. This reminds me of the bug that was found in the binary search algorithm a few years ago, IIRC, in Java. The interesting thing is that binary search is probably one of the earliest-invented algorithms. Yet, in the book Writing Efficient Programs by Jon Bentley (which I mentioned in a recent HN comment), he says that in a class he taught to several industrial programmers with many years of experience, some had bugs in their implementations of binary search that he set them as an exercise. Not sure but I think I remember reading in the article about the Java binary search issue, that even his algorithm had the bug that was found in the Java version. Why it was not found earlier is (maybe) because it only occurred with an extremely large array, IIRC. Don't have a link right now, but it can probably be found by searching for the right phrase.

link

vram22 3461 days ago

Just did a google search, and it even partially auto-completed this search for me:

bug in java binary search

and showed a related search in the drop-down, 'programming pearls ...', a book by Jon Bentley, which seems to confirm what I said above (though I saw it in his other book, "Efficient Programs", IIRC - he might have mentioned the same issue in the Programming Pearls book too).

Edit: and the Wikipedia article confirms it too:

https://en.wikipedia.org/wiki/Binary_search_algorithm#Implem...

link

jbn 3461 days ago

I think this is it: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6412541

link

vram22 3460 days ago

Yes, that's it, by the desc. and dates shown.

link

pigz 3461 days ago

https://research.googleblog.com/2006/06/extra-extra-read-all...

link

vram22 3460 days ago

Yes, thanks, that closely describes the assignment I read that he gave.

link

loeg 3460 days ago

It's basically bogus to have a single object bigger or equal to half of address space (represented by size_t) in C. 32-bit platforms should detect and abort in such conditions (compiler/linker for static objects, malloc() implementation for dynamic allocations).

link

icebraining 3460 days ago

Why? If you're running a system with PAE, half of a 32-bit address space is a small fraction of the whole addressable memory.

link

loeg 3460 days ago

It's either addressable or it isn't. My understanding of typical PAE systems is that userspace is still limited to 32 bits of address space per process. Any system where userspace is not limited to 32 bits should have a larger than 32-bit size_t. (PAE systems are not true 32-bit platforms.)

link

dragandj 3461 days ago

Today, with ML, big data and similar applications, that might be often.

link

minipci1321 3461 days ago

Probably not very likely, but keep in mind that this method could also be used without actually allocating the array -- akin to the 'offsetof()' macro. (Which is undefined behavior.)

link

icedchai 3461 days ago

On a 64-bit platform (anything modern), ptrdiff_t is going to be 64-bit so this will not be an issue (ok, 63-bit... but you get my point.)

link

mannykannot 3460 days ago

In this context, the difference between 63 bits and 64 bits is not trivial.

link

yoklov 3460 days ago

Both are bigger than the x64 address space...

link

kabdib 3460 days ago

Often enough; "pack" files in video games are often many GB. Memory-map one of those and there you are . . .