Hacker News new | ask | show | jobs
by jjtheblunt 1990 days ago
isn't generic Sets easily implemented with map being already generic?
5 comments

> isn't generic Sets easily implemented with map being already generic?

Since Go has neither generic functions nor generic typedefs you can't implement a Set with a generic key type on top of map, you have to reimplement all the set operations for each key type you use.

I think map[T]bool is already a pretty good set; the only things you can do with sets are insertion, deletion, iteration and checking for existence and they're all well-supported.

Of course, if you need a concurrent set you're right back in type system hell.

You can’t write intersection, union, difference, subset (contains all), or powerset as reusable functions for any element type. The idiomatic thing for now is to rewrite them as loops over and over, but that’s error prone, hard to read, and not a good use of time.
This was the exact issue I hit. There are set libraries for Go, but they only handle standard types. The for loops got old quick.
In addition to that, map[T]bool only works if T is one of the few types that Go can check for equality automatically. You can't define a custom equality (+hash) function for your type and use it with the built-in map.
I see what you mean there.
Not if you need to implement set operations (union, intersect, etc.) on your own each time.

Generic maps as sets are only ok for membership checks.

One challenge there is that identity is only supported for some built-in types — only primitives, and structs of primitives; no pointers, slices, maps.

If you want a set of some complex kind of value that contains non-map-indexable types like slices and pointers, then you have build an indirection around it.

A good set implementation needs to support a comparison operation. I really wish this existed for Go maps, too.

For some uses of a set, yeah. For intersections / unions etc. nope. Then it's back to for loops, so many for loops in Go.
Yes and that's probably why there is no set in the go std lib. You just can use struct{}{} as (empty) value in a map.
As an aside, this is exactly how HashSets are implemented in the Rust standard library.[0]

[0]: https://rust-lang.github.io/hashbrown/hashbrown/hash_set/ind...

I thought the idomatic approach was to represent sets with map[key]bool
The problem of doing that is twofold:

* each key now has 3 possible states (true, false, and unset) rather than two

* a bool takes 1 byte to store (which may get more problematic due to alignment, I've never checked what the memory layout of go's map is so I don't know how much of a concern it is there)

An empty struct fixes these issues: a key being present means the item is in the set, and an empty struct is zero-sized.

edit: apparently go maps are laid out as buckets of 8 entries with all the keys followed by all the values, so there's no waste due to padding at least.

As someone who finds "indicating intent in the code" an important thing, I must admit I find this concert slightly horrifying. A Map and a Set are two different things and which one you use conveys some intent as to what you mean by your code. I get that it works, but it would still make me unhappy to do.
Ideally, if the language/standard library provides maps but not sets, and you wanted to use the idiomatic set = map of type -> bool approach, you'd create a wrapper so that intent is preserved but users don't have to know about the backing mechanism. Of course, it's obnoxious if everyone has to do this themselves and the language lacks generics so you have to write this once for each potential type.
Wrappers that don't wrap very much aren't worthwhile IMO. Its like getting an amazon box with a fedex box inside. Just give me the package itself.
Shallow wrappers that don't wrap much (now) but convey intent better are valuable if you buy into the idea of modularity and encapsulation in general. Some reasons (probably more out there):

1. The now-provided interface can more clearly express what the code is intending to do (better names for the operations you're providing than the underlying system has, remove_from_end to pop or dequeue)

2. Hide methods of interacting with the underlying data structures that you don't want people to use (use a C++ vector as a stack, but don't want random access)

3. You can replace the underlying mechanisms at will without impacting the users

If you just wrap a vector in your own vector class and otherwise provide the same operations (or a limited set of operations but for no good reason to restrict usage), sure, that's moronic. But if you wrap a vector class in a "BigNumber" class and provide operations like add, subtract, mod, etc. then value has been added. Same thing with the idea of wrapping a map in a set interface.

> A Map [with value type = void/unit/() [0]] and a Set are two different things

Not to defend Go or anything, but that's like saying:

| A Array [with element type = byte/char/u8] and a String are different things

It might be useful to call them different names (of course that would require Go to support generic typedefs for `type Set k = Map k Void`), but they're still fundamentally the same thing.

0: which, to be fair, is not the same thing as a map with bool values.

Fundamentally, all values are a collection of bits. That doesn't mean having distinct structures built on top of those bits isn't useful.
A map[T]bool has 3 states for every key; absent, true, and false. A map[t]struct{} has 2 states for every key; absent, and present.

People new to Go tend to pick map[T]bool or map[T]int because they're used to using bools and ints throughout their code, but struct{} is the correct value type for sets. (That is not to say that a counting set, map[T]int is useless, however. If you need that, use that!)

I usually use map[T]bool because

  if _, ok := wasTouched[thing]; !ok {
    touch(thing)
    wasTouched[thing] = struct{}{}
  }
is way uglier than

  if !wasTouched[thing] {
    touch(thing)
    wasTouched[thing] = true
  }
That doesn’t actually work unless you first fill all possible values with `false`.
Wrong. The whole point of this construct is that in Go maps return default values for keys that do not exist. The default value for bool is false.
Hehe. It definitely was.. I think this is also still somewhere in "Efficient Go". However this seems to have changed in recent years. I was surprised by this too and personally I still prefer the bool even though it uses a bit more memory.

People argue there are 3 states but it is meaningless in my opinion because you can just ask exists := someMap[someKey] without checking for existence as you do with real maps. Here false is equivalent to not existent.

The bool is meaningless, so empty struct is more clear as it contains no state.