Hacker News new | ask | show | jobs
by kerkeslager 1221 days ago
I can't speak for Haskell as I haven't looked at their implementation, avoiding copies is pretty typical of immutable data structures. Because each component of the data structure is immutable, you can safely point to any part of the structure that you don't need to change rather than copy it.

For example, in a basic binary search tree implementation of a map (using C-ish syntax for those who don't know a functional language):

    struct Node {
      String key;
      int value;
      Node left;
      Node right;
    }

    Node set(Node n, String key, int value) {
      if(n == null) {
        return new Node { key = key, value = value, left = null, right = null };
      }

      if(key < n.key) {
        return new Node {
          key = n.key,
          value = n.value,
          left = set(n.left, key, value),
          right = n.right
        };
      }

      if(key > n.key) {
        return new Node {
          key = n.key,
          value = n.value,
          left = n.left,
          right =  set(n.right, key, value)
        };
      }

      return new Node { key = key, value = value, left = n.left, right = n.right };
    }
A perfectly-balanced tree with depth of 5 has 1 + 2 + 4 + 8 + 16 = 31 nodes. If you call the above function, on such a tree, the worst case scenario is that the key doesn't exist, so it modifies 5 nodes during its search and creates a new sixth node. 26 of the original 31 nodes are reused and referenced by the newly-created map. The percentage of nodes reused only improves as the perfectly-balanced tree gets larger.

Of course, if this is your implementation of set(), the tree won't be perfectly-balanced, so a production implementation of a tree-based map needs tree-rebalancing (as well as memory-reordering and compacting for cache locality). These extra constraints typically mean less of the tree can be re-used, but the percentage of nodes which can be reused remains high.