Hacker News new | ask | show | jobs
by impoppy 620 days ago
Didn’t Python tackle a similar problem? I remember that dictionaries were known to be unsorted, however, when writing tests, I’ve noticed that the items were always in the same order. I don’t remember what was I looking up, but my manager and I came to a conclusion that depending on order items of a dictionary was acceptable and having to fix the tests if they somehow broke later was an okay tradeoff. Now, reading Python docs on that topic, they briefly[1] mention that dictionaries’ list views yield items in the order they were inserted without any mentions if items are sorted or not.

[1] briefly in this regard means I’ve seen it in the docs exactly once without much attention paid to that sentence

https://docs.python.org/3/tutorial/datastructures.html#dicti...

2 comments

This behaviour was introduced in 3.6 (and made part of the spec in 3.7 iirc)

From the python 3.6 change log:

New dict implementation¶ The dict type now uses a “compact” representation based on a proposal by Raymond Hettinger which was first implemented by PyPy. The memory usage of the new dict() is between 20% and 25% smaller compared to Python 3.5.

The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5). (Contributed by INADA Naoki in bpo-27350. Idea originally suggested by Raymond Hettinger.)

https://docs.python.org/3.6/whatsnew/3.6.html#new-dict-imple...

The article does mention Python:

> We didn’t invent this idea. Python and Go were both doing this before us, and we modeled our approach on Python’s.

However, this is referring to the actual hashing. Also, Python was doing it for a fundamentally different reason (to avoid denial-of-service attacks caused by pathological input where every key would map to the same hash bucket).

However, starting with Python 3.6, the dictionary implementation doesn't actually store the key-value pairs directly in the hash buckets. Instead, it stores indices for a separate table. This saves space (because the indices can be much smaller than full entries) and also incidentally makes it easy to iterate over key-value pairs in order of insertion (by iterating over the table).

This proved popular, and in 3.7 it became official that this preservation of insertion order would be guaranteed going forward. Note that this is not a "sorted" dictionary structure; the keys are not compared or ordered by value.

Note that space is not freed up in the table when an element is deleted; this would require either O(N) shifting of the data, or ruining the insertion order. As such, Python developers accidentally cheated themselves out of a further optimization by deciding to offer the guarantee.

I have some notes (and a quick demo program) on GitHub demonstrating the space performance characteristics across Python versions: https://github.com/zahlman/python-dict-stats/ . This includes a link to https://stackoverflow.com/questions/327311 which gives a lot of detail about the implementation.

>Now, reading Python docs on that topic, they briefly[1] mention that dictionaries’ list views yield items in the order they were inserted without any mentions if items are sorted or not.

It seems clear enough to me:

> Performing list(d) on a dictionary returns a list of all the keys used in the dictionary, in insertion order (if you want it sorted, just use sorted(d) instead).

The insertion order guarantee is more formally documented in the What's New in Python 3.7 documentation (https://docs.python.org/3/whatsnew/3.7.html), which cites the original decree on the mailing list (https://mail.python.org/pipermail/python-dev/2017-December/1...).