Hacker News new | ask | show | jobs
by Singletoned 3505 days ago
> I'm a Python programmer. I fully expect dict keys to be able to be arbitrary objects.

I'm a Python programmer too, and I also fully expect dict keys to be able to be arbitary objects, and I get really frustrated with the fact that they can't be arbitary objects. They have to be hashable objects, and the hash function refuses to hash certain objects that it has decided aren't allowed.

5 comments

Well, you cannot use a mutable object as a key without some help. What would you expect to happen if the object is modified after being inserted into the dictionary?

If you have a list or dictionary you want to use a key, you can convert it into a tuple which would make it immutable.

If you don't care about performance for larger collections you could just use a list instead of a dictionary, which does not require hashing. Changes to the original mutable object will also be reflected in your list.

  > What would you expect to happen if the object is modified after being inserted into the dictionary?
I would expect it to continue to use the object. Why would it matter if it was mutated or not?

You can try it right in your browser with Javascript and the Map() object type. You can mutate the object all you want, as long as it's the same object(-reference) you access the same value in the Map object. Just as I would expect. "Mutable" doesn't change which object you have, only what it looks like inside.

This is where Pythonistas rely on the idea that "we're all consenting adults here". It's very useful to let objects assert that they have an arbitrary, immutable hash value. I've used that ability in Python and it solved problems that would have been very challenging otherwise. Of course, that ability also leads to undocumented behavior if hash values change. Python coders expect dicts to behave in undocumented ways when object hashes change, but it would be a bug if that led to a segfault or a C assertion failure.
I'm genuinely curious how/why this is a problem for Python yet I've never heard of it being a problem for C#. In C# you can override Object.GetHashCode(), and there's no way in C# to even express the constraint/desire "the result of GetHashCode doesn't change (especially after insertion into a container)". Yet I've never heard of anyone in C# having a problem using mutable objects in sets or dictionaries?
To be honest, most Python coders probably never get deep enough in the language to start overriding hash values, and those who do probably understand the implications before they get that deep.
This fundamentally reduces the usefulness of hash representations though. Instead of thinking of dict keys, think sets.

It's also worth pointing out that user-defined objects operate the way you seem to desire (e.g. default to using the id() property of the object to derive the hash). So in my mind, this strikes the perfect balance?

StdLib types that are immutable bake in more comparative hash properties while making it trivial for a user to wrap them and override this richer behavior with default hash based on id().

Consider:

```

>>> class myList(object):

... def __init__(self):

... self.inner_list = list()

...

>>> x = myList()

>>> x

<myList object at 0x10bdabf10>

>>> x.inner_list

[]

>>> y = {x: 'hi'}

>>> y

{<myList object at 0x10bdabf10>: 'hi'}

>>> x.inner_list.append(1)

>>> x.inner_list

[1]

>>> y

{<myList object at 0x10bdabf10>: 'hi'}

>>> for k in y.keys():

... print k.inner_list

...

[1]

```

  > This fundamentally reduces the usefulness of hash representations though.
If you want something immutable just put something immutable in?

  > Instead of thinking of dict keys, think sets.
It's the same that I already wrote for sets, and also the same I just wrote: If you want something immutable just put something immutable in.

By the way, I'm not talking about any particular implementation.

I think there is a disconnect here, I showed you how you could place a mutable object as a dict key in Python. One which uses the object itself (id() property) for the hash.

> If you want something immutable just put something immutable in.

My point is what do you expect if you do this:

>>> x = (1, 2)

>>> y = (1, 2)

>>> set([x, y])

I would argue that you would expect:

>>> set([(1,2)])

Since x is equal to y. But if you base the hash by default on id, you would get:

>>> set([(1, 2), (1, 2)]) # set([x, y])

Since x and y are different instances (have different id()) properties. I would argue that it's more useful/predictable for built-ins to hash based on content rather than id().

However, you can still hash based on id() but simply creating your own object (as I demonstrated). So you get sensible defaults for built-ins but also the option of hashing mutable objects.

We were originally talking about a dictionary key, and the purpose of an object as key in a dictionary is AFAICS to associate a value with that concrete object. Whenever I use those structures I have to solve an engineering problem, not a math problem. I would claim that's the majority of use cases, a claim that seems to be supported by how it is commonly implemented (see Java below and Javascript for two of the most common languages).

Anyway, it's configurable for example in Java (http://stackoverflow.com/questions/9440380/using-an-instance...) - default is to use the reference but you can override object methods to change that. Similarly for C#(http://stackoverflow.com/questions/634826/using-an-object-as... or http://stackoverflow.com/questions/8952003/how-does-hashset-...).

> What would you expect to happen if the object is modified after being inserted into the dictionary?

This question is sometimes not important: When the mutable object is not mutated.

> This question is sometimes not important: When the mutable object is not mutated.

Which you can't guarantee when the object is mutable, so for consistency's sake it's left out.

Though you can always build your own Dict subclass and implement __hash__ on it and be angry with yourself when it fails. :)

I was going to reply to him, but you beat me to it. But since you posed a different question regarding the term hashable, I thought I would link some helpful starting docs (for you and/or others):

https://docs.python.org/3/glossary.html#term-hashable

http://stackoverflow.com/questions/4348232/python-dictionary...

Come on, how many time in your entire life did you need this ?

For set, it's a bit more annoying. But for dicts. twice in 10 years maybe ?

After a few years of Clojure(Script), where immutability means you can use almost anything as keys, I have found many cases where my python code would have been simpler and easier if I could use arbitrary objects (and therefore data structures) as keys.

I mean, its not a huge deal, but it adds up.

> I have found many cases where my python code would have been simpler and easier if I could use arbitrary objects as keys

You can use pretty much anything as a dictionary key

That's really not true[1]. You can only use hashable types, which excludes dicts, lists and sets.

For example (tested on both python 2.7 and 3.4):

    >>> {{'a': 1}: 2}
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'dict'
    >>> {[1,2,3]: 4}
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'list'
    >>> {{1,2,3}: 4}
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unhashable type: 'set'
Objects inheriting from object are hashable by default, but the hash is based on the objects instance such that each instance will return a unique value for that instance:

    >>> class A(object):
    ...     def __init__(self):
    ...         self.x = 1
    ...
    >>> a = A()
    >>> b = A()
    >>> d = {a: 1, b: 2}
    >>> a.x = 10
    >>> b.x = 10
    >>> d[a]
    1
    >>> d[b]
    2
While this makes sense for the instance stored in the dict (because mutability would otherwise mean that they key changes values), I don't think this is particularly useful. I rarely look up dicts by instance, but rather by value. That is, I would construct another object with the same attributes and look up by that.

You can, of course, implement your own __hash__ to make it work, but you have to do it manually for any object you want it and most third-party objects won't have implemented it so you'd have to monkey patch them.

Contrast that with Clojure, where all built-in data structures (maps, sets, lists, dicts etc) work as keys out of the box.

[1] I guess its true in the sense that you can implement __hash__ to make it work. But its not particularly easy or idiomatic.

It has been quite common when wanting to use a dictionary as a cache or counter such as a function cache or to count how many times a function is called with certain arguments. If the values of the arguments aren't hashable it doesn't work.
As an aside, what happens in Python when you try to hash objects that contain circular references?