1. No, dereferencing a null pointer will not "cause a sigsegv". It causes UB. In practice, in unix user space, yes it'll probably be SIGSEGV.
2. A null pointer is not a valid pointer: Yeah… Once again my question was "But when would it not be a valid pointer, and yet also not a null pointer? A null pointer we can check for easily."
This code will NEVER deference a null pointer. Not under any compiler, not with any compiler options:
if (ptr != NULL) { *ptr = 0; }
> A null pointer is not a valid pointer in a predominant number of systems in existence.
No, that's not quite pedantically accurate. A null pointer is not a valid pointer in the C programming language. Address zero may or may not be, that's outside the scope of the C language. Which is why embedded and kernel work sometimes has to be very careful here.
> They may not have a MMU, and in such a case the operation will succeed.
Lack of MMU does not mean address zero is valid. It definitely* doesn't make a null pointer valid. In fact, a null pointer may not point to address zero.
A zero (0, not NULL!) pointer is a valid pointer in C/C++. It is not a UB, and it means one simple thing: «give me the contents of a memory cell (a byte, a word, a long word etc) at the address of 0». Old hardware designs used the address of 0 to store a jump address of the system boot-up sequence (i.e. firmware), and I personally wrote the code in C to inspect / use it in the unpriviledged hardware mode.
The prevailing number of modern systems do not map the very first virtual (the emphasis is on virtual) memory page (the one that starts from zero) into the process address space for pragmatic reasons – an attempt to dereference a zero pointer is most assuredly a defect in the application. Therefore, an attempt to dereference a zero pointer always results in a page fault due to the zeroeth memory page not being present in the process' address space, which is always a SIGSEGV in a UNIX.
Embdedded systems that do not have a MMU will allow *ptr where «ptr» is zero to proceed happily. Some (not all) systems may even have a system specific or a device register mapped at the address being 0.
You are conflating several unrelated things, and there is no pedantry involved – it is a very simple matter with nothing else to debate.
> it means one simple thing: «give me the contents of a memory cell (a byte, a word, a long word etc) at the address of 0»
Well… sometimes. If you set a pointer to literal 0, you do not actually make that pointer point to address zero, from the C language's point of view. No, you are then setting it to be the null pointer. (c99 6.3.2.3 paragraph 3)
Now, what is the bit value of a null pointer? That's undefined.
So how do you even set a pointer to point to address zero? In the C standard, maybe if you set an intptr_t to 0 and then cast it to the pointer? Actually I don't know how null pointer interacts with intptr_t 0. Is intptr_t even guaranteed to contain the same bit pattern? I don't see it. All I see is that it's guaranteed to convert back and forth without loss. For all I can find in the spec, converting between intptr_t and pointer inverts the bits.
A null pointer "is guaranteed to compare unequal to a pointer to any object or function".
Did you put an object or function at address zero? Sounds pretty UB to me.
> modern systems […] SEGV
I already agreed with you on this. I mean… now modern systems don't let applications map address zero (actually, is that always true? I know OpenBSD stopped allowing it after some security holes. I'm too lazy to check if Linux did too)
In any case, this is a fix that's only like 10 years old (or I'm old and it's actually 20). It used to be possible.
> Embdedded systems that do not have a MMU will allow *ptr where «ptr» is zero to proceed happily.
This is absolutely not true. An embedded system could have I/O mapped to address zero reboot the machine on read or write. And that'd be perfectly fine for the C language spec, since C doesn't allow dereferencing a null pointer.
MMU is not the only way memory becomes magic. In fact, it's probably the LEAST of the magic memory mapping that can happen.
> with nothing else to debate.
I mean… you're just wrong. I'm not conflating unrelated things. I'm correcting multiple unrelated mis-statements you made.
To add the things up though: Let's say you intend to read from address zero, so you do `char* ptr = 0; something(*ptr);`. C standard would allow this to set ptr to 0xffff, and reading from that address starts the motor. The C standard doesn't say. It just says that assigning 0 sets it to null pointer, which on some systems is 0xffff.
I've certainly worked on embedded stuff that "did stuff" when an address was read. Sometimes because nobody hooked up the R/W pin, because why would they if the address goes to a motor where "read" doesn't mean anything anyway?
You are conflating tha language capability with the hardware capability. C/C++ do not place restrictions on dereferencing the 0th address. Consider the following stub:
/* -O2 -std=c23 -Wall -fno-inline-functions */
int *ptr0 = 0;
int *ptr0p = (int *)0;
int
main ()
{
return *ptr0 | *ptr0p;
}
Head over to godbolt, compile it, and check the code. Zero compilation warnings, and the compiler duly obliges to generate the code that accesses a memory cell at the address 0x0 and all architectures that godbolt supports (ARM, RISC-V, SPARC64, POWER64, TI, S390 and others – with no exceptions).
So if you run that code on a system before the MMU is activated or on a system without a MMU, «main» will return 0 on all systems[0] (if the memory is initialised with zeroes). You do have a point that some embedded systems[1] may have device registers mapped at 0, but that bears no relevance on the generated code – it will still attempt to read the 0th address.
You can also test the generated code in QEMU on an architecture of your choice in the «bare metal mode» (i.e. memory protection off) and observe that a read from 0 will give you 0 if the first memory page is filled with 0s.
You are most assuredly conflating a pointer to 0 dereferencing with the memory protection/virtual memory management system, and the explanation is in the first answer. It is Linux that implements a kernel-level check in mmap(2) on the address to mmap into, not the hardware. It is a Linux-specific quirk, and other UNIXes will allow the mmap to 0 to proceed but reading from 0 will still yield a SIGSEGV due to memory protection being in use.
> MMU is not the only way memory becomes magic. In fact, it's probably the LEAST of the magic memory mapping that can happen.
MMU is not magic. It is a simple and very efficient design that works in concert with the microarchitecture it has been implemented for – CPU traps, memory page descriptors and tables.
> I mean… you're just wrong. I'm not conflating unrelated things. I'm correcting multiple unrelated mis-statements you made.
Respectfully, so far I am yet to see a single compelling argument or tangible piece of evidence to support the claims you have espoused. I have provided a few very concrete and specific examples as supporting evidence, but I am not seeing the same on your side.
[0] The only exception that does not initialiase memory with zeroes that I am aware of is AIX (but not POWER/PowerPC that it runs on!) – the AIX VMM initialises a new memory page upon allocation with 0xdeadbeef to make unintialised pointers forcefully crash the process. Linux, *BSD's running on POWER/PowerPC do not do it, it is an AIX specific quirk.
[1] Again, embedded may have a nuance (subject to a specific hardware* implementation) as it is a commonplace in embedded systems to not* have a contiguous memory space and have holes in it, including the zeroeth address. It does not preclude the generated code to attempt to access 0, though, if the hardware supports it.