| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alayne 2328 days ago
	Other languages don't generally have special order guarantees about standard maps. This seems very idiosyncratic.

10 comments

pimlottc 2328 days ago

For almost all intents and purposes, object keys have been ordered in JavaScript since ES2015. Map has always been ordered.

https://www.stefanjudis.com/today-i-learned/property-order-i...

link

jnwatson 2328 days ago

Users of JSON, probably the most common data interchange format on the planet, frequently have implicit requirements about key ordering. It is highly convenient to be able to parse a JSON string into a native Python data structure, add a field, emit it back, and preserve the ordering.

link

anthonypasq 2328 days ago

in what reasonable use case would the order of the properties on an object matter? I can't think of one

link

jwkane 2328 days ago

when you are diffing the serialized output?

link

philwelch 2328 days ago

From what I recall of the JSON standard itself, there's no guarantee about key ordering being significant. If you're diffing serialized output to compare two JSON objects you need to be serializing it in a consistent format, otherwise even whitespace is going to throw you off.

link

jwkane 2328 days ago

It's significant to a human that wants to know what has changed.

link

philwelch 2328 days ago

If a human is inspecting serialized JSON using pen and paper, the human is presumably clever enough to match up key for key regardless of ordering.

If the human is using a computer to compare two JSON payloads (as the use of a diffing algorithm suggests), the human and computer should be clever enough as a team to realize that they could just deserialize and reserialize each JSON payload such that the keys were lexicographically sorted and the data was pretty-printed in the exact same way before running it through the diffing algorithm. `jq -Sc` would do the trick.

link

anonymoushn 2328 days ago

So you can sort the keys at serialization time rather than paying a performance penalty all the time?

link

wutbrodo 2328 days ago

I tracked down a bug once where someone was hashing state that happened to include Python dicts serialized to JSON, which caused different hash values for the same entity. I know, I know, insert tears of blood emoji, and any engineer worth his salt wouldn't be so sloppy. But you don't always get to choose to work only with top talent that understands why you should design things elegantly as a baseline[1]. In these cases, removing implicit gotchas like "the same code has the same behavior" (ie iteration over the same dict) is valuable.

[1] Well you _can_ choose to do so, and I recently have, but it's not without its costs.

link

gmanley 2328 days ago

I can't speak for all "other" languages but Ruby made the change years ago from 1.8 to 1.9. Ruby calls them hashes but they're standard maps. Obviously Ruby and Python are very similar but I think the point stands.

link

_whiteCaps_ 2328 days ago

Golang map iteration is returned in random order specifically to make sure that people don't rely on the order.

I think this feature says a lot about the philosophy of Python vs Go.

link

MereInterest 2328 days ago

Sounds like a fast and idiomatic way to shuffle a deck of cards is then to convert to a map and back.

link

thedirt0115 2328 days ago

Convert a deck of cards ([]Card?) to a map (map[Card]bool?) and back just to shuffle? That's unlikely to be faster or more idiomatic than a straightforward implementation of the Fisher-Yates shuffle[1]. Try writing the code to do it both ways and compare.

[1]: https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle

link

kragen 2327 days ago

I think "idiomatic" would be using rand.Perm, which implements the Fisher%E2%80%93Yates shuffle. But aside from whether it's idiomatic, converting to a map isn't random enough.

link

kragen 2327 days ago

With Golang 1.11, I get orderings like this. S is spades, H is hearts, D is diamonds, and C is clubs, because HN eats the Unicode characters I was actually using.

     S9  SJ  SK  HQ  D9  S4  S5 S10  HJ  S8  H6  DA  D2
     D4  DQ  C6  C8  CJ  H3  H9  DK  C3  CQ  SA  S2  S3
     S6  SQ  H4 H10  D8 D10  C4  CK  S7  H7  D5  D6  D7
     C7  H2  HK  D3  DJ  C2  C5  C9  HA  H5  H8  CA C10

On average you would expect about one card in a shuffled deck to be (cyclically) followed in the deck by its succeeding card, and on about one out of 50 shuffles, that card would be followed by its successor. Here we have S4 S5, DA D2, SA S2 S3, and D5 D6 D7.

This is very strong evidence of bias.

I'm interested to hear if my Golang can be made more idiomatic, or if there are bugs in it: http://canonical.org/~kragen/sw/dev3/mapshuffle.go

In particular it seems like there ought to be a less verbose way to express the equivalent of the Python list({card: True for card in deck}) in Golang.

link

kragen 2328 days ago

No, it isn't random enough for that.

link

jarekkruk 2328 days ago

IIRC it used to just start iteration at a pseudorandom index and then iterate normally. I looked at it couple years ago, don't know if they changed it.

link

kragen 2327 days ago

It seems to do that, but I think it also tweaks the hash function for each newly created map, because in the code linked from https://news.ycombinator.com/item?id=22278753 I'm not getting rotations of the same iteration order when I generate two maps. There doesn't seem to be any randomness in mapiternext() itself.

You could imagine that the hash function itself might do an adequate job of randomizing the order of the cards, though, especially if salted with a per-map salt. SipHash, for example, would probably not have any detectable biases in the distribution of the permutations thus produced. But whatever hash function Golang is using for my structs has an easily visible bias, as described in that comment.

link

gowld 2328 days ago

What does it say? That Python is willing to spend more computation time for simplicity, without waiting for the programmer to ask for it? That Python will make semantic changes in minor version bumps of the language?

link

jwkane 2328 days ago

The new-ish ordered dictionary is actually faster. It is also completely backward compatible for any but the most contrived example.

Now wasting computation on randomizing... that is a problem.

link

pletnes 2328 days ago

The ordering property is a side effect of the new and more cpu/memory efficient data structure. It would be very surprising to say no to a 2x performance jump to avoid the (sometimes useful) ordering.

link

andrewshadura 2328 days ago

Tcl's dicts are ordered.

link

dukoid 2328 days ago

Java added LinkedHashMap a while ago. I tend to default to it except for small temporary use cases where order absolutely won't matter or have an outside impact...

link

masklinn 2328 days ago

LinkedHashMap is not Java's "standard maps" though. If you tend to use it by default it's really a personal idiosyncrasy, especially as LLHM tend to be larger and slower than regular hashmaps due to having to maintain a doubly linked list.

Python has had one such in the standard library for a decade or so.

link

xxs 2328 days ago

LHM is not slower for iteration (it's faster actually).

LHM indeed pays 2 references per node but they are well worth as it has deterministic ordering/iteration and I have witnessed numerous cases with HashMap that show up in production only due to iteration order (esp after rehashing). The code is broken but the cases did not reproduce during testing...

Now if the two extra references are an issue, consider that HashMap is quite space inefficient with having a dedicated node per each entry - that's the main price. The (simple) node memory footprint is 36bytes on heaps less than 32GB, i.e. compact pointers. The extra references add another 8 bytes for having an insert or access order.

If the goal is getting a compact low memory footprint, HashMap is not fit for purpose. Overall it's the jack of all trades and even got reworked (java8) to support tri-based collision resolution for keys that implement Comparable.

Couple years back I wrote CompactHashMap (under CC0) that has an average cost of 10bytes per entry and it's extremely compact for small sizes with having only 2 fields on its right own, so even small/empty maps are tiny. In microbenchmarks (same used in openjdk) it handily (2x) beats java.util.HashMap on "Traverse key or value", get/put have similar performance, and "Search Absent" is worse.

The point is: LHM should be the go-to hashmap for java as being node based hashmap is justified (unlike java.util.HashMap)

link

oh-4-fucks-sake 2328 days ago

Yep. Also, LinkedHashMap maintains ordering based on insertion order. So, iff you insert your data ordered how you want it, it's behaving as an ordered map. If you want true, self-reordering map, what you want is a TreeMap. Beauty with that one is that you have full control over defining the custom ordering function, because we're not always indexing a map by primitives or autoboxed type. I try to use it sparingly though as you're paying log(n) on pretty much all map operations.

While we're here, another tidbit that's often overlooked in the Java collections: If you reallly care about iteration performance, your data is without nulls, your data is already ordered how you like it or you don't care about ordering, your qty items >= 10, and you don't need random access, then ArrayDeque is gonna be your horse because of how much better it co-locates its contents in memory and how much less overhead is required to maintain it during each operation compared to all the other List implementations, including ArrayList and LinkedList.

link

Izkata 2328 days ago

In regards to sibling replies, does it feel to anyone else like everything (except golang) is converging towards PHP's array() ?

link

masklinn 2328 days ago

Interestingly enough here it was kinda but kinda not the other way around: historically PHP used a closed-addressing hash map and threaded a doubly linked list through it to maintain the insertion order.

But the dict TFA talks about doesn't use a doubly linked list, or closed addressing, its ordering is a side-effect of its implementation but not originally a core goal (memory saving and iteration speed were). It'd probably been proposed by others before but it came to wider attention after Raymond Hettinger (a core dev) proposed it a few years back[0]. PHP actually released it first[1], closely followed by pypy[2]. CPython only got around to it some time later[3]

[0] https://mail.python.org/pipermail/python-dev/2012-December/1...

[1] https://nikic.github.io/2014/12/22/PHPs-new-hashtable-implem...

[2] https://morepypy.blogspot.com/2015/01/faster-more-memory-eff...

[3] https://docs.python.org/3/whatsnew/3.6.html?highlight=3.6#wh...

link

l_t 2328 days ago

JavaScript Maps are iterated over in insertion order.

link

reubenmorais 2328 days ago

std::map in C++ stores keys in order. You have to use std::unordered_map to not get that behavior.

link

xyzzyz 2328 days ago

Yes, but that’s different kind of order. Python dicts order is insertion order, while std::map is key order.

link