Hacker News new | ask | show | jobs
by zahlman 624 days ago
The article does mention Python:

> We didn’t invent this idea. Python and Go were both doing this before us, and we modeled our approach on Python’s.

However, this is referring to the actual hashing. Also, Python was doing it for a fundamentally different reason (to avoid denial-of-service attacks caused by pathological input where every key would map to the same hash bucket).

However, starting with Python 3.6, the dictionary implementation doesn't actually store the key-value pairs directly in the hash buckets. Instead, it stores indices for a separate table. This saves space (because the indices can be much smaller than full entries) and also incidentally makes it easy to iterate over key-value pairs in order of insertion (by iterating over the table).

This proved popular, and in 3.7 it became official that this preservation of insertion order would be guaranteed going forward. Note that this is not a "sorted" dictionary structure; the keys are not compared or ordered by value.

Note that space is not freed up in the table when an element is deleted; this would require either O(N) shifting of the data, or ruining the insertion order. As such, Python developers accidentally cheated themselves out of a further optimization by deciding to offer the guarantee.

I have some notes (and a quick demo program) on GitHub demonstrating the space performance characteristics across Python versions: https://github.com/zahlman/python-dict-stats/ . This includes a link to https://stackoverflow.com/questions/327311 which gives a lot of detail about the implementation.

>Now, reading Python docs on that topic, they briefly[1] mention that dictionaries’ list views yield items in the order they were inserted without any mentions if items are sorted or not.

It seems clear enough to me:

> Performing list(d) on a dictionary returns a list of all the keys used in the dictionary, in insertion order (if you want it sorted, just use sorted(d) instead).

The insertion order guarantee is more formally documented in the What's New in Python 3.7 documentation (https://docs.python.org/3/whatsnew/3.7.html), which cites the original decree on the mailing list (https://mail.python.org/pipermail/python-dev/2017-December/1...).