Hacker News new | ask | show | jobs
by jrziviani 3735 days ago
It's possible to have 'this' == NULL. Consider the code:

  #include <iostream>

  using namespace std;

  class myclass
  {
      public:
          int sum(int a, int b)
          {
              cout << this << endl;
              return a + b;
          }
  };

  int main()
  {
      cout << ((myclass*)0)->sum(10, 11) << endl;
      return 0;
  }
this will print: 0 21

And it's not a null pointer dereference because in C++ it's the function responsible to know the class its belongs to. It means that this code:

  ((myclass*)0)->sum(10, 11)
should become:

  sum(0 /*this*/, 10, 11)
and not

  0->sum(10, 11)
It can be UB in the C++ specification, but considering the systemv abi, 'this' is just a parameter that will be pushed in the call stack before calling 'sum'.
3 comments

The problem with undefined behavior, is that the compiler is allowed to assume that it will not happen.

First, within myclass::sum, the compiler can assume that "this" will never be null. In your example, you pass "this" to cout; if the iostream code is inlined, and it has a conditional on the passed pointer, for instance to special-case the output of a null pointer, the compiler will omit the comparison and output only the code for the non-null case.

Second, within the main function. The compiler can see you are calling a method with a null "this" pointer. Since this is undefined behavior, it clearly can't happen, and there must be something earlier in the call path that leads to that part of the code never being executed. Therefore, since this is the first thing in the main function, the compiler can assume that main() will never be called, and replace it with an empty function (to satisfy the linker).

If your compiler doesn't do that, it only means that it's not smart enough yet. The behavior can change in later versions of your compiler, or if you use a different compiler.

I didn't understand why my point was downvoted, anybody who likes to know how things work on backstage or likes to hack stuff would find it interesting.

Sure, undefined behavior can assume anything, but the compiler writer still has to respect the ABI, right?

I was reading the Itanium C++ ABI draft and the only thing I found about it is:

"If a class has a non-virtual destructor, and a deleting destructor is emitted for that class, the deleting destructor must correctly handle the case that the this pointer is NULL. All other destructors, including deleting destructors for classes with a virtual destructor, may assume that the this pointer is not NULL." https://mentorembedded.github.io/cxx-abi/abi.html

However, you can see that I don't have an instance, there is no object so the destructor won't be called. It means that the compiler writer won't probably consider checking 'this' pointer for null - clang/gcc both don't care:

  % g++-4.9 -Wall -O3 -std=c++11 test.cpp -o test; ./test
  0
  21

  % clang++ -Wall -O3 -std=c++11 test.cpp -o test; ./test 
  0
  21

  0x400886 <main()>    push   %rbp
  0x400887 <main()+1>  mov    %rsp,%rbp                          
  0x40088a <main()+4>  mov    $0xb,%edx
  0x40088f <main()+9>  mov    $0xa,%esi
  0x400894 <main()+14> mov    $0x0,%edi
  0x400899 <main()+19> callq  0x400910 <myclass::sum(int, int)>
The 'this' pointer is just a parameter in the register EDI.

NOTE: I don't recommend anyone writing code like this. I'm always concerned about UB and I agree with you. But it's still interesting to know how things really are.

This is still undefined behavior. Sometimes undefined behavior does what you want, but you can't count on it.
What happens if sum is a virtual function? Or does it not matter? I am not too familiar with c++ casting magic.
Most likely you'll get a segfault when it tries to read from page 0 to get to the class' vtable (which is where the references to the virtual functions live).
Yes, you'll get a segmentation fault, consider this code, compiled with debug information:

  #include <iostream>

  using namespace std;

  class myclass
  {
      public:
          virtual int sum(int a, int b)
          {
              return a + b;
          }
  };

  int main()
  {
      myclass test;
      cout << test.sum(1, 2) << endl;
      cout << ((myclass*)0)->sum(10, 11) << endl;
      return 0;
  }
Debugging:

  (gdb) b main
  Breakpoint 1 at 0x4008ae: file test.cpp, line 16.
  (gdb) run
  Starting program: /tmp/test 

  Breakpoint 1, main () at test.cpp:16
  16	      myclass test;
  (gdb) ni
  17	      cout << test.sum(1, 2) << endl;
  (gdb) info vtbl test
  vtable for 'myclass' @ 0x400a30 (subobject @   0x7fffffffe410):
  [0]: 0x400974 <myclass::sum(int, int)>
Notice that the function address to be called is not know at compile time, the compiler wrote a lookup table, named vtable, to know which function it will call based on the class instance.

  (gdb) disassemble main
  [snip]
   0x00000000004008ae <+8>: movq  $0x400a30,-0x10(%rbp) ; see the vtable address here
   => 0x00000000004008b6 <+16>:	lea  -0x10(%rbp),%rax ; ohh, the vtable address is the 'myclass test' instance*
   0x00000000004008ba <+20>:	mov    $0x2,%edx
   0x00000000004008bf <+25>:	mov    $0x1,%esi
   0x00000000004008c4 <+30>:	mov    %rax,%rdi ; pass this to RDI
   0x00000000004008c7 <+33>:	callq  0x400974 <myclass::sum(int, int)>
* it means that the first (and only one, in this case) field of my myclass is that vtable.

  (gdb) print test
  $1 = {_vptr.myclass = 0x400a30 <vtable for myclass+16>}
  (gdb) info register rax
  rax   0x7fffffffe410	140737488348176
  (gdb) x/x 0x7fffffffe410
  0x7fffffffe410:  0x00400a30
  (gdb) x/x 0x00400a30
  0x400a30 <_ZTV7myclass+16>:	0x00400974
Then, the compiler knows that it must call '[0]: 0x400974 <myclass::sum(int, int)>' function.

Compare with:

   ((myclass*)0)->sum(10, 11)
   0x00000000004008e5 <+63>:	mov    $0x0,%eax    ; vtable address is 0 :(
   0x00000000004008ea <+68>:	mov    (%rax),%rax  ; the address to RAX is 0, and we cannot access that segment of memory - BOOOM
   0x00000000004008ed <+71>:	mov    (%rax),%rax
   0x00000000004008f0 <+74>:	mov    $0xb,%edx
   0x00000000004008f5 <+79>:	mov    $0xa,%esi
   0x00000000004008fa <+84>:	mov    $0x0,%edi
   0x00000000004008ff <+89>:	callq  *%rax