Hacker News new | ask | show | jobs
by kazinator 888 days ago
There is no problem with memcpy other than that you can't use a null pointer. You can memcpy zero bytes as long as the pointer is valid. This works in a good many circumstances; just not circumstances where the empty array is represented by not having an address at all.

For instance, say we write a function that rotates an array: it moves the low M bytes to the top of the array, and shuffles the remaining M - N bytes down to the bottom. This function will work fine with the zero byte memmove or memcpy operations in the special case when N == 0, because the pointer will be valid.

Now say we have something like this:

  struct buf {
    char *ptr;
    size_t size;
  };
we would like it so that when the size is zero, we don't have an allocated buffer there. But we'd like to support a zero sized memcpy in that case: memcpy(buf->ptr, whatever, 0) or in the other direction likewise.

We now have to check for buf->ptr being buf in the code that deals with resizing.

Here is a snag in the C language related to zero sized arrays. The call malloc(0) is allowed to return a null pointer, or a non-null pointer that can be passed to free.

oops! In the one case, the pointer may not be used with a zero-sized memcpy; in the other case it can.

This also goes for realloc(NULL, 0) which is equivalent to malloc(0).

And, OMG I just noticed ...

In C99, this was valid realloc(ptr, 0) where ptr is a valid, allocated pointer. You could realloc an object to zero.

I'm looking at the April 2023 draft (N3096). It states that realloc(ptr, 0) is undefined behavior.

When did that happen?

2 comments

N2464 [0]: there was lots of implementation divergence on what realloc(ptr, 0) did (especially with BSD, which allegedly doesn't free the memory at all?), so they just declared it UB.

[0] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf

I read this as "there were a lot of buggy implementations of the C standard, so we imported the bug into the standard". Crazy. Don't make the language less defined going forward.

When those implementations eventually pick up C23, they surely could fix the bug as well. At best this should have been an errata/defect for the previous standard, so that the previous standards document behavior of implementations of said standards.

The BSD people don't understand what little standards they do read. It's unfortunate that we have to spoil the language for their sake.

The requirements in C99 and before are perfectly clear. realloc is described as liberating the old pointer, and then allocates a new one as if by malloc. (Except that it magically has access to both objects so it can transfer the necessary bytes that must be transferred from the old to the new.)

It is perfectly clear what happens when size is zero. No byte can be copied from the old object, if any. The behavior is like free(oldptr) followed by return malloc(newsize).

Your IQ would have to be well below 85 to misunderstand the requirements.

And those requirements are still there; there is still the description of realloc in terms of freeing the old pointer and allocating a new object with malloc.

There was no need to insert a gratuitous removal of definedness for the size zero case, given that malloc handles it.

Applications now have to do this:

  void *sane_realloc(void *ptr, size_t size)
  {
    if (size == 0) {
      // behave literally as required in C99
      free(ptr);
      return malloc(0);
    }

    return realloc(ptr, size);
  }
Supposedly because a few vendors were not able to code this logic in their realloc functions?
Just a reminder to myself how happy I am to leave this academic level of snobbery behind and migrate to languages that help getting things done and save you from nonsense like this. I did C for about ten years. Hope I’ll never have to do a single line of it again.
> this academic level of snobbery

...is actually very uncommon in the C world (at least much less common than in the C++ or Rust world).

Specific compilers and stdlibs have specific behaviours which may or may not fully agree with the details written down in the standard. You'll have to pick a subset of compilers you want to support and write your code against that subset of compilers. It's not perfect, but also not much of a problem. It's simply the reality when there are multiple compiler implementations.

In theory, it's really not much different than developing some website that has to run on mobile devices and desktops and all manners of browser implementations and versions and OS and screen sizes and be accessible to screen readers as well. Or is not JavaScript a "language that helps get things done"?

In fact, I'll take the stable, well defined compilers that I can actually test against any day.

C99 and C11 have no special treatment for a size of zero. Since "memory for the new object [of size zero] cannot be allocated, the old object is not deallocated and its value is unchanged." (emphasis added). This is exactly what BSD does.

C17 says "If size is zero and memory for the new object is not allocated, it is implementation-defined whether the old object is deallocated" (emphasis added).

What standard, exactly, is BSD violating?

Such behavior violates C89 (they later made it unspecified, likely to match the divergence):

> If size is zero and ptr is not a null pointer, the object it points to is freed.

It also violates every version of the SUS and POSIX up to Issue 6 (they made it unspecified in Issue 7):

> If size is 0 and ptr is not a null pointer, the object pointed to is freed.

Going down to at least the 3rd edition of SVID:

> If size is zero and ptr is not a null pointer, the object pointed to is freed.

What's obvious about all those Unix-related specifications is that they are not of BSD lineage.
That all refers to a failing allocation, obviously. If realloc cannot allocate the new object, the old one stays valid.

In the case of size zero, malloc(0) can return null (always). That is ambiguous; it looks like a failure. If realloc literally were to call malloc(0) and then treats the null as a failure, it will return the old object. However, on an implementation where malloc(0) always returns null by design, that would be obviously be poor behavior for its own realloc.

If malloc(0) returns null by design that is not a case where allocating an object failed.

We basically now need this in every program that might resize to zero:

  void *sane_realloc(void *ptr, size_t size)
  {
     if (size == 0) {
       free(ptr);
       return malloc(0);
     }
     return realloc(ptr, size);
  }
The only problem is that if malloc(0) returns null on an implementation where it normally returns an allocated pointer (and thus the call has failed) we don't detect the failure and don't preserve the original object. The application which relies on sane_realloc has to understand that when size is zero, the deallocation always works, whether or not the subsequent malloc does, and so it may get a null pointer on any platform.
I don't think insulting the BSD folks is very nice. They probably had good reasons for their decisions.
Mostly, the BSD people think that POSIX and ISO C are just forks of Unix documentation, that are mainly used for making non-Unix systems look Unix-like. BSD comes from real Unix DNA and so following those is optional; whatever direction BSD takes is Unix by definition, as if it's forever 1983.
Interesting. I wonder if that means BSD's compiler toolchain has ignored the strict-aliasing misfire that makes such a mess of the ISO associated ones.
"BSD compiler toolchain" is just a GNU toolchain that is a decade plus old.

It will be probably twenty years before someone programming on BSD will have some code wrongly deleted as unreachable because it comes after realloc(ptr, 0).

They are immune from the damage.

For instance, OpenBSD 6.8 was released in 2020. On that release, gcc reports as 4.2.1, released in 2007.

Is there a rationale for a memory allocator to support zero sized allocations? Is this really just about providing a "technically" valid pointer for the pointer/size pair structure? To me it seems any address is a potentially valid pointer to a zero-sized object. Do allocators really keep track of these null allocations? That would require keeping state for every single address in the worst case...

It's very strange. I wrote my own memory allocator and I can't figure out the right way to handle this. Eliminating the need for these "technically" valid pointers that can't actually be accessed because they're zero sized seems like the better solution.

> When did that happen?

More importantly, why did that happen? People have told me that I should care about the C standards committee because they take backwards compatibility very seriously. Then they come out with breaking changes like these.

> Is there a rationale for a memory allocator to support zero sized allocations?

Mainly, that it has supported that before and programs rely on it.

Programs written to the C99 standard can resize a dynamic vector down to empty with a resize(ptr, 0). The pointer coming from that will be the same as if malloc(0) has been called.

So now, that has been taken away; those programs can now make demons fly out of your nose.

Thank you, ISO C!

> Do allocators really keep track of these null allocations? That would require keeping state for every single address in the worst case...

Implementations of malloc(0) that don't return null are required to return a unique object. To do that, all they have to do is pretend that the size is some nonzero value like 1 byte. (The application must not assume that there is any byte there that can be accessed).

> Programs written to the C99 standard can resize a dynamic vector down to empty with a resize(ptr, 0).

C99 has no resize() function. Assuming you mean realloc(), C99 does not guarantee you can use realloc() in this manner.

See also:

https://news.ycombinator.com/item?id=38850575

https://stackoverflow.com/questions/16759849/using-realloc-x...

https://wiki.sei.cmu.edu/confluence/plugins/servlet/mobile?c...

https://developers.redhat.com/articles/2023/07/26/checking-u...

> C99 does not guarantee you can use realloc() in this manner

Yes it does. It requires support for reallocing down to zero, which results in an object that is like one that comes from malloc(0).

(What some people think is that realloc(x, 0) is equivalent to free(x). It isn't. Resizing down to zero isn't freeing. It might be, if malloc(0) doesn't allocate anything and just returns null. Why some people think realloc(x, 0) is free(x) is that they read realloc man page from the Linux man-pages project which says such a thing.)

realloc(ptr, 0) could fail to free ptr, in the situation that allocating the zero-sized replacement object fails. In that case, null could be returned, leaving the old object valid. This is ambiguous, because null could also be the happy case return value when the old object was freed and the zero-sized allocation deliberately produced null. Under those conditions, the cases in which there is a memory leak are indistinguishable from the ones in which there isn't.

(I'd rather suffer a memory leak in the OOM condition, than have previously defined behavior gratuitously flip to undefined.)

I literally quoted someone from WG14.

HN really needs the ability to block trolls.

You quoted someone who stated they are not happy with C23, and deflected personal blame for that issue.

If I'm only trolling, then why, having learned about this, am I having to go into code and make defensive fixes?

Indeed, you couldn't reliably free() the old pointer if the realloc(ptr, 0) failed.

But xrealloc(ptr, 0) (or equivalent) would still be perfectly consistent, assuming you trust your implementation to support non-null 0-size allocations in the first place. It's very common to just "leak it all and abort" on a critical error like memory exhaustion. There's a reason most non-C languages expose an infallible allocation API as the default option.

I do think that UB is an overly heavy hammer for realloc(ptr, 0), since the xrealloc(ptr, 0) use case works just as well regardless of how unspecified the values of the old pointer or errno are on failure.

Yes. If realloc(ptr, 0) returns a null pointer, you don't know whether that's due to a failure (in which case ptr is still valid) or whether it's the happy case (ptr was freed, and the zero-sized request for replacing it produced a null). Thus you don't know whether ptr is still a valid pointer. If it's valid and you treat it as invalid (hands off), that's a leak. If it's invalid and you treat it as valid (free it), that's a double free.
I'm not talking about implementations that produce a 'successful' null pointer. I'd consider that a quality-of-implementation issue, in that implementations are responsible for returning non-null on 0-size success in the same way they're responsible for not just stubbing out every single malloc() call, so just assuming that a null output indicates failure is appropriate. (Implementations transitioned ages ago toward returning non-null for 0-size requests for good reason!)

Instead, the problem is about a realloc(ptr, size) that returns null to indicate failure. If size > 0, then the data behind ptr remains unmodified and can be later freed. But if size == 0 (and the 0-size allocation fails), then the data behind ptr is unconditionally freed according to many implementations.

This makes it unsafe to access the data behind ptr after a realloc() failure, unless you've checked that size > 0. But I argue that by making the whole thing UB instead of leaving it sufficiently unspecified, the xrealloc(ptr, size) use case that doesn't care about the leak on failure is made more complicated unnecessarily.

If storing the metadata in the heap, 0 bytes often doesn't even end up a special case. You need to have a case for allocations of some arbitrary number of bytes, and 0 is an arbitrary number of bytes.

Another option is to treat them as being of size 1.

(In theory you could do endless allocations of size 0, and eventually you'd run out of space, even though you've allocated 0 bytes in total. But you end up in exactly that situation, whatever the allocation size, if you don't take bookkeeping overhead into account!)