Hacker News new | ask | show | jobs
Hashing it Out (akshayr.me)
49 points by barbierocks 2167 days ago
2 comments

Here's a cute follow-up: what are all possible outputs of the following program?

    import random

    class C:
        def __hash__(self): 
            # return 0 or 1, with equal probability
            return random.randint(0, 1)

    x = C()
    d = {}

    d[x] = "foo"
    d[x] = "bar"
    d[x] = "baz"

    print(len(d), d[x])

    1 baz
    1 None (?)
    2 foo
    2 bar
    2 baz
If __hash__(x) is recalculated on every indexing (?).
I think that's right.

On a side node, if I recall my old python knowledge right the <1 - baz> and <1 - None> cases are only possible because -- well for one it uses only one object, so the fast check "key is elem" is always true -- but also the object utilizes the default __eq__ function, which tests on object identity. And since the same instance of C is used for all object invocations, this is always true.

The behavior would be different using:

        x = C()
        y = C()

        d[x] = 'foo'
        d[y] = 'bar'
        d[x] = 'bar'
        d[y] = 'qux'
The hash might collide, but the equality will cause the open addressing to skip the element.

I dislike recalling debugging a hash function that wasn't entirely deterministic in some edge cases.

Essentially correct! Instead of `1 None` it'd fail with a `KeyError`. You can get a probability distribution too:

    1 baz       1/8
    KeyError    1/8
    2 foo       1/8
    2 bar       1/4
    2 baz       3/8
Indeed, I got

  Counter({
      (1, 'baz'): 12525,
      (1, 'KeyError'): 12552,
      (2, 'foo'): 12255,
      (2, 'bar'): 25282,
      (2, 'baz'): 37386
  })
from running

  import random
  
  class C:
      def __hash__(self): 
          # return 0 or 1, with equal probability
          return random.randint(0, 1)
  
  def output():
      x = C()
      d = {}
      d[x] = "foo"
      d[x] = "bar"
      d[x] = "baz"
      try:
          return len(d), d[x]
      except Exception as e:
          return len(d), type(e).__name__
  
  from collections import Counter
  Counter(output() for i in range(100000))
I think it would be

  KeyError
instead of

  1 None (?)
but otherwise looks like the right answer to me too.
The real treasure of akshayr.me:

https://akshayr.me/snoop/

LOL