Hacker News new | ask | show | jobs
by kkowalczyk 4540 days ago
Hoare didn't invent 0-valued pointer. It was there since the beginning of time. Or at least the beginning of the CPUs.

People talk about getting rid of nil like it's actually possible.

It's not.

If you have pointers, they sometimes have to start the life uninitialized (i.e. with a value of 0) hence nil pointers.

As you admit yourself, the proposed solutions don't actually get rid of anything. At best they can force you to handle nil value by wrapping it in some wrapper.

Guess what - if you have that kind of discipline, you can do the same in C++.

Why aren't people doing that?

Because it comes at a cost. There's a cost to the wrapper and when it comes down to it, you still have to write the code to handle the nil pointer whether you're using a wrapper or a raw pointer.

It just doesn't buy you much.

Finally, fixing crashes caused by referencing a null pointer is downright trivial. Spend a few weeks chasing a memory corruption caused by multi-threading and you'll come to a conclusion that fixing a null pointer crashes (which can be done just by looking at the crashing callstack most of the time) is not such a big deal after all.

6 comments

> If you have pointers, they sometimes have to start the life uninitialized (i.e. with a value of 0) hence nil pointers.

Maybe at the machine level. But there's nothing that stops a programming language a few levels above machine from requiring that every pointer be initialized with a reference.

> As you admit yourself, the proposed solutions don't actually get rid of anything. At best they can force you to handle nil value by wrapping it in some wrapper.

The entire point is that a large majority of APIs don't WANT to accept or handle NIL, but have to, because the default is to allow it. And in some languages, such as Java, the only way to extend the type system is with reference types, making it impossible to ever not have to handle NIL. By reversing this decision, it becomes possible to specify both allowance and disallowance of NIL-valued parameters. Why you would ever argue against such expression is beyond me.

> Maybe at the machine level. But there's nothing that stops a programming language a few levels above machine from requiring that every pointer be initialized with a reference.

What if the pointer is to a large structure (expensive to initialize), and I want a function that returns a pointer, which may fail to initialize?

Without a nil, developers would create structures that have a "valid" field. Nil just makes that more convenient, and the way Go does it is pretty good -- you can't cast it the same way you can in C/C++.

If you want to be able to return "pointer or null", any decent language will let you do that, sure. But it's good to have the option of saying "this function will never return null".

Think of it this way: functions should document whether (and under what conditions) they return null, right? What if the compiler could check the accuracy of that documentation, so it would be an error to return null from a function that said it didn't return null, and a warning to document a function as possibly-returning-null when it never did? (And once you had that, surely you'd want a warning when you accessed a possibly-null thing without checking whether it was actually null?)

So basically you want the compiler to solve the halting problem for every function it compiles.
Ideally yes. But keeping track of nullability is not impossible, it's not even hard, as any number of existing languages with such checking prove (some of which have very performant compilers).
With a Maybe/Option type, you are forced to always deal with the possibility of Nothing/None.

And even better, you can use monadic bind to chain together several actions on possibly-nullable things and get either a value or a null at the end.

An argument can be made that this merely masks the problem : it is certainly possible for a Haskell function to crash (and therefore not return anything), no matter it's declared type :

reallyfun xs = reallyfun xs ++ [ 1 ]

Note that this will OOM, not just run infinitely long. There are plenty of ways you can cause this sorts of issues. So you can't trust Haskell functions to always return their declared type either.

The million-dollar-question : should your program be ready for this ? (in the case of a database : ideally, yes it should, and it's in fact possible to do just that)

That's the problem with abstractions, like the "always-correct-or-null" pointers of java : they're leaky. A type system, unless reduced to pointlessness, can't really be enforced fully. Haskell ignores failure modes, like memory and stack allocation, jumping to other parts of the program, reading the program, ... all of which can in fact fail.

Thinking about this gives one new appreciation for try ... except: (catching an unspecified exception). It's not necessarily worse than a pure function. Good luck defending that position to mathematicians though.

That misses the point, I think.

Of course functions can cause a program to crash, and there are all sorts of bad things that happen that cannot be caught at the language level; Haskell doesn't save you from memory corruption, for example.

But those things don't violate the language's guarantees about type correctness. A crashing program simply ceases to run; it's not like an Int-returning function can somehow crash and return a bogus Int value.

In this sense, Haskell is no different from languages like Java or Go. It's completely orthogonal to the null problem.

This is the mathematician's argument. It boils down to the unfairness of having to execute programs on real hardware with real constraints. Well, that doesn't work in the real world obviously. Especially memory allocations WILL fail, so, frankly, deal with it. Haskell makes this impossible, and therefore throws real-world correctness out of the window because it makes mathematical correctness so much messier.

Your assertion that this can't be caught at the language level is wrong : checking malloc'ed pointers for NULLness will do it. In java, catch OOM exceptions. This error doesn't have to cause your programs to crash. Neither does an infinite loop ("easy" to catch in Java). Given more tools, you can write programs that are safe from some measure of memory corruption.

The real world is messy. Pretending it's not doesn't fix that, and nobody but mathematicians are surprised at all. End result is simple : your programs will crash after you've "proven" it can't crash. Running everything in "just-large-enough" VMs has massively exacerbated CPU and OOM error conditions, at least from where I'm sitting. I'd expect further cloud developments to make it worse.

So the type system only guarantees correctness if the following conditions hold, amongst other things:

1) infinite available memory (infinite in the sense that it is larger than any amount the program requests, haskell provides zero guarantees that limit memory usage, so ...)

2) infinite available time for program execution (again, for the normal definition of infinite)

3) no infinite loops anywhere in your program (more problematic in haskell because every haskell tutorial starts with "look, lazy programming means infinite loops terminate in special case X", and of course it only works in special cases)

Note that this is only the list of "hard" failures. There are factors that can blow up the minimum execution time of your program (e.g. VM thrashing, stupid disk access patterns) that I'm not even considering. In practice, these "soft" failures, if bad enough, cause failure of the program as well.

And only then do we get to the conditions that people keep claiming are the only conditions for haskell to work:

4) no hardware malfunctions

As others have pointed out, I'm not talking about nil as the zero-valued pointer at the hardware level. I'm talking about nil as the additional member of every type in the language's typesystem. A typesystem is an abstraction over the hardware, designed to catch programmer errors and facilitate communication among programmers. There's no reason it has to admit every possible value that the hardware can support.

And people are applying that sort of discipline - see the NullObject pattern in any OO language, or the @Nullable/@NotNull annotations in Java, or !Object types in Google Closure Compiler. The thing is, they have to apply it manually to every type, because the type system assumes the default is nullable. That makes it an inconvenience, which makes a number of programmers not bother.

I'll agree that null pointers are comparatively easy to track down compared to memory corruption caused by multi-threading. Threads are a broken programming model too, and if you want a sane life working with them you'll apply a higher-level abstraction like CSP, producer-consumer queues, SEDA, data-dependency graphs, or transactions on top of them.

> Threads are a broken programming model too

Oh come off it. Threads are in no sense 'broken' - compared to CSP or actors, they just give you a larger set of things you can write. Some of those things are bugs. Others are very useful. For example, a disruptor:

http://lmax-exchange.github.io/disruptor/

Nulls are broken because they let you write bugs, but don't let you write anything you couldn't write with options or whatever.

Really ? How would you even encode Maybe or Option in Java, if you can't use null anywhere ?

The problem is that Maybe doesn't work without Algebraic Data Types.

Oh come on, that's trivial

  public interface Maybe<T> {
      boolean hasValue();
  }

  public final class Just<T> implements Maybe<T> {
      public final T value;
      public Just(T value) {
          this.value = value;
      }
      public boolean hasValue() { return true; }
  }

  public final class Nothing<T> implements Maybe<T> {
      public boolean hasValue() { return false; }
  }
You can even get rid of hasValue (but then you need to pay for instanceof each time); or of the Nothing class and make Maybe a class. You may ask what the value of Just.value is before the constructor runs - the value is a machine null and if you somehow manage to access it before the ctor runs, that's a NullPointerException; or what writing "Type varname;" in a function would do - that would be perfectly legal, but you won't be allowed to use it if the compiler can't prove you've initialised it first (which it does right now).
Problems :

1) How do you get at the value itself ? (Casting ? That's bad)

2) How do you prevent in Maybe<Integer> x; x == null ?

3) How do you prevent someone from extending Maybe<T> ? e.g.

    public final class LetsHaveFun<T> implements Maybe<T> {
      public boolean hasValue() { throw Exception("Can't touch this");
    }
4) (you need a null check in the constructor)

5) (I dislike the autoboxing this uses)

> 1) How do you get at the value itself ? (Casting ? That's bad)

You could add the usual map method to the Maybe type - a Maybe<T> can take a Function<T, U>, and returns a Maybe<U>. If it's None, it doesn't call the function, and just returns None; if it's Some, it calls it with the value, and wraps the result in a new Some. If you want to do side-effects conditionally on whether the value is there, you just do them in the Function and return some placeholder value. You could write a trivial adaptor to take an Effect<T> and convert to to a Function<T, Void>, etc.

> 2) How do you prevent in Maybe<Integer> x; x == null ?

You're right that using a Maybe does not exclude the ability to use nulls. Nobody can deny that. The point i was making is that there is nothing useful that you can do with nulls that you cannot do with a Maybe instead.

> 3) How do you prevent someone from extending Maybe<T> ? e.g.

You can trivially control extension by making Maybe an abstract class, giving it a private constructor, and making Some and None static inner classes of it. It's a kludge, but it works!

1) Yes, casting. You need to do casting in Haskell too, it's just hidden for you by pattern-matching

2) It's for a hypothetical Java implementation that doesn't have null

3) If you really want to, make it an abstract class and do a check in the constructor that this.getClass() == Nothing.class or Just.class.

4) see 2)

5) well if you're using primitive types they are non-nullable already

Edit: to clarify, I don't expect someone to use it for Java today, it's what I would put in the standard library if Java didn't have a null in the first place.

> Threads are a broken programming model too

More specifically, threads with shared mutable state. If state is never simultaneously shared and mutable then a host of problems disappear.

And even more specifically - threads with locks. Shared mutable state is okay as long as any state-swapping operations are atomic, eg. with STM or if you build up the state in one thread, switch it with an atomic pointer swap, and don't let other threads mutate the state.
Threads with locks are fine as long as the compiler enforces that you take the lock before mutating the data (e.g. Rust). :)
Annotalysis can do this for C++ as well. The problem is that you need to enforce an ordering on locks to avoid deadlocks. That works fine if they're all within one module. It doesn't work at all if you have to coordinate callbacks across threads from different third-party libraries.

The usual solution I've seen given for this is "Don't invoke callbacks when holding a lock." This is not a viable solution for most programs.

I'm surprised no one has mentioned Rust[1] yet. It's a systems programming language that has managed to get rid of null pointers by carefully controlling the ways in which a a value can be constructed[2].

[1] http://www.rust-lang.org/

[2] https://github.com/mozilla/rust/wiki/Doc-language-FAQ#how-do...

And IIRC the compiler translates usages of None to null, so any complaint about inefficiency probably does not apply in this case, either.
This is actually quite common in languages which support ADTs—you stick tags in the pointer to discriminate between subtypes of a union.
I don't think Rust is such a shining example of a solid language in this area, at least not yet. For example: http://stackoverflow.com/a/20704252/626867
That has nothing to do with null pointers, and that was agreed to be changed today.
The SO issue you linked to has nothing to do with null pointers.
> If you have pointers, they sometimes have to start the life uninitialized (i.e. with a value of 0) hence nil pointers.

Right, and those pointers must then be declared as potentially null. Or more generally, potentially non-present values of any type need a type like Maybe T.

> As you admit yourself, the proposed solutions don't actually get rid of anything. At best they can force you to handle nil value by wrapping it in some wrapper.

That's exactly the point: you can tell from a glance at any type whether it can be null or not, and you can only look at the value of something that's guaranteed to not be null; otherwise, you have to handle the null case first. Forcing that to happen at compile time via the type system is far better than dereferencing a null pointer at runtime.

> If you have pointers, they sometimes have to start the life uninitialized (i.e. with a value of 0) hence nil pointers.

Why do your variables have to start life uninitialized?

> Why aren't people doing that? Because it comes at a cost. [...]

Not a runtime cost, though. This can be handled purely by the typesystem at compile time.

> Guess what - if you have that kind of discipline, you can do the same in C++.

Well, ironically, C++ already has non-nullable pointers. They're called references and they work extremely well.

  int& toref(int* ptr) {
      return *ptr;
  }
  //snip
  int& i = toref(0);
  i = 5; // segfault
One thing I kind of appreciate at times about C++ (or at least about certain implementations) is the fact you can call methods on null pointers and you can check in the method if it's called on a null pointer. I.e.

  class A {
  public:
    int foo() {
        return this == null ? 1 : 2;
    }
  };

  //....
  A* a = 0;
  std::cout << a->foo();
Which allows you to make classes that work just fine even if you use a null pointer.
> int& toref(int* ptr)

Yes, it's theoretically possible. The point is, you need to explicitly do evil things to get "null references". The language cannot and is not meant to protect you from yourself.

> One thing I kind of appreciate at times about C++ (or at least about certain implementations)

Yes, certain implementations indeed. It is formally undefined behavior, so anything at all could happen if you do that; the most straightforward and efficient implementation just happens to "work".

> The point is, you need to explicitly do evil things to get "null references". The language cannot and is not meant to protect you from yourself.

No, you don't need to do evil things at all. Any C api that returns a pointer which you then need to pass to a C++ function that takes a reference is a possible source of failure if you forget to check your pointer. It can happen quite easily by mistake. The language cannot in any way guarantee that you won't make a reference from a null pointer. At least taking a reference as an argument does at least alert people that you're not expecting to be passed a null as an argument to your function.

A fair point. Still, only having to check for nulls on certain API boundaries is much less work than the alternative, and it's easy to train oneself to automatically spot "dangerous" points where a pointer-to-reference conversion is done.