Hacker News new | ask | show | jobs
by fileeditview 1990 days ago
Yes and that's probably why there is no set in the go std lib. You just can use struct{}{} as (empty) value in a map.
2 comments

As an aside, this is exactly how HashSets are implemented in the Rust standard library.[0]

[0]: https://rust-lang.github.io/hashbrown/hashbrown/hash_set/ind...

I thought the idomatic approach was to represent sets with map[key]bool
The problem of doing that is twofold:

* each key now has 3 possible states (true, false, and unset) rather than two

* a bool takes 1 byte to store (which may get more problematic due to alignment, I've never checked what the memory layout of go's map is so I don't know how much of a concern it is there)

An empty struct fixes these issues: a key being present means the item is in the set, and an empty struct is zero-sized.

edit: apparently go maps are laid out as buckets of 8 entries with all the keys followed by all the values, so there's no waste due to padding at least.

As someone who finds "indicating intent in the code" an important thing, I must admit I find this concert slightly horrifying. A Map and a Set are two different things and which one you use conveys some intent as to what you mean by your code. I get that it works, but it would still make me unhappy to do.
Ideally, if the language/standard library provides maps but not sets, and you wanted to use the idiomatic set = map of type -> bool approach, you'd create a wrapper so that intent is preserved but users don't have to know about the backing mechanism. Of course, it's obnoxious if everyone has to do this themselves and the language lacks generics so you have to write this once for each potential type.
Wrappers that don't wrap very much aren't worthwhile IMO. Its like getting an amazon box with a fedex box inside. Just give me the package itself.
Shallow wrappers that don't wrap much (now) but convey intent better are valuable if you buy into the idea of modularity and encapsulation in general. Some reasons (probably more out there):

1. The now-provided interface can more clearly express what the code is intending to do (better names for the operations you're providing than the underlying system has, remove_from_end to pop or dequeue)

2. Hide methods of interacting with the underlying data structures that you don't want people to use (use a C++ vector as a stack, but don't want random access)

3. You can replace the underlying mechanisms at will without impacting the users

If you just wrap a vector in your own vector class and otherwise provide the same operations (or a limited set of operations but for no good reason to restrict usage), sure, that's moronic. But if you wrap a vector class in a "BigNumber" class and provide operations like add, subtract, mod, etc. then value has been added. Same thing with the idea of wrapping a map in a set interface.

Wrapping to hide is valuable, but wrapping has a cost which is generally underrated. Every wrapper is a thing itself which must also be understood when trying to understand how things work. And every wrapper is a division between blocks of code, meaning if you have changes which impact multiple layers of wrap, its harder to determine what to change, and to maintain the understandability of each layer.

For this reason im an advocate of lazy wrapping. Create an abstraction at the last moment, when its painfully obvious what benefit it will provide, when you can see how it ties together disparate pre-existing code blocks, and when you have the highest confidence that it will stick and not need to be unwrapped next week by the senior dev.

> A Map [with value type = void/unit/() [0]] and a Set are two different things

Not to defend Go or anything, but that's like saying:

| A Array [with element type = byte/char/u8] and a String are different things

It might be useful to call them different names (of course that would require Go to support generic typedefs for `type Set k = Map k Void`), but they're still fundamentally the same thing.

0: which, to be fair, is not the same thing as a map with bool values.

Fundamentally, all values are a collection of bits. That doesn't mean having distinct structures built on top of those bits isn't useful.
A map[T]bool has 3 states for every key; absent, true, and false. A map[t]struct{} has 2 states for every key; absent, and present.

People new to Go tend to pick map[T]bool or map[T]int because they're used to using bools and ints throughout their code, but struct{} is the correct value type for sets. (That is not to say that a counting set, map[T]int is useless, however. If you need that, use that!)

I usually use map[T]bool because

  if _, ok := wasTouched[thing]; !ok {
    touch(thing)
    wasTouched[thing] = struct{}{}
  }
is way uglier than

  if !wasTouched[thing] {
    touch(thing)
    wasTouched[thing] = true
  }
That doesn’t actually work unless you first fill all possible values with `false`.
Wrong. The whole point of this construct is that in Go maps return default values for keys that do not exist. The default value for bool is false.
Hehe. It definitely was.. I think this is also still somewhere in "Efficient Go". However this seems to have changed in recent years. I was surprised by this too and personally I still prefer the bool even though it uses a bit more memory.

People argue there are 3 states but it is meaningless in my opinion because you can just ask exists := someMap[someKey] without checking for existence as you do with real maps. Here false is equivalent to not existent.

The bool is meaningless, so empty struct is more clear as it contains no state.