Hacker News new | ask | show | jobs
by the__alchemist 844 days ago
Go and Python have OK enums. I will use them, but they could be simpler/more expressive. This begs the question: Is there an obstacle to releasing better enums in the next Python and Go versions? If the concern is about breaking backwards compatibility, I would be OK with a new type. Is it a culture issue, ie that Python and Go programmers don't use enums much? (Chick + egg here)

Rust's enums are great. No "auto" boilerplate if not mapping to an integer, exhaustive pattern-matching, sub-types etc.

1 comments

> Rust's enums are great.

Rust doesn't have enums. It has sum types – that for some reason it arbitrarily decided to call enums.

Sum types are great. There is a good case to be made that Go would benefit from the addition of sum types. But until that day there isn't much more you can do with enums. That's all enums are – a set of named constants.

Rust's enums are entirely unlike C enums, and reasonably similar to Java enums.

Not the first time a word has been used for several nearly-unrelated concepts, and it won't be the last.

I've heard this before, but I have a struggle understanding the abstraction.

I make heavy use of rust and Python enums (Are they both misnamed?) Those + structs are generally the base of how I structure code.

The "enums" in the article also seem to be of the same intent. Is this a "no true Scotsman" scenario?

Some research implies the difference is a True Enum involves integer mapping, while a Sum Type is about a type system. I think both the Rust and Python ones can do both. (repr(u8) and .value for Rust/Python respectively)

The use case is generally: You have a selection of choices. You can map them to an integer or w/e if you want for serialization or register addresses etc, but don't have to. Is that a sum type, or an enum? Does it matter?

Another thought:

Maybe:

  #[repr(u8)]
  enum Choice {
    A = 1
    B = 2
  }
Is an enum, while

  enum Choice {
    A(C)
    B(D)
  }
Is a sum type?
Enumerations back to integers. Enumerations can have iterators written on them that exhaustively enumerate the possible values. (Sum types either have no such enumeration at all, or in general, they're useless, so you don't see them.) Enumerations can be represented by a canonical and small set of strings, if you want a string backing them.

This is what an enumeration is, partially because that's precisely what the word "enumeration" means; the ability to assign an ordinal number to each value in the enumeration. To "enumerate" a set is to assign integers to them. In Python, for instance, see the "enumerate" function, which does exactly enumeration on the output of some iterator.

Sum types can be used to represent enumerations, but it's very restrictive subset of sum types. Trying to understand what a "sum type" is through the lens of a single integer would be a very strange way to approach them. Nor are sum types a "superset" of an enumeration; a base sum type is not an enumeration. You need to add more things to it to get an enumeration. In a Venn diagram they're the classic two cicles with some overlap in the middle but with distinct bits on each side.

I do not understand the strangely active desire some people seem to have to erase the distinction between these two things, as if some advantage will result, as if sum types will somehow become more useful than they are or as if they will somehow lose their abilities if we don't also call them enumerations. There is no advantage to smudging these two unique things together. Not saying that you are promoting this per se, the__alchemist, just that I've seen it a lot and I don't get it. It's like someone wanting to claim that database and files are really the same thing; well, sure, there's some overlap, but each does many things the other doesn't and trying to squint until they actually are the same thing is generally the exact wrong direction to go to attain understanding.

To put it another way, when adding an "enumeration" into a network protocol, you allocate some fixed number of bits to hold a given sized integer. When you add "a sum type" into a network protocol, you have a lot more work to do in general.

To put it yet another way, enumerations have meaningful implementations of a ".Next()" that a sum type really doesn't. If you have a sensible implementation of a given method on one type of thing and it's not sensible on some other thing, then clearly they can not be the same thing.

(I say multiple times that a sum types doesn't have such an implementation in this message. By that I mean that while it is trivial to have a "data Color = Red | Green | Blue | RGB Int Int Int" and implement an iterator to walk through all possible values, it is not something that is generally done for all sum types, and if the sum type also includes functions or other complex values it isn't in general possible at all in common programming languages. Again, writing an interator for "all possible functions" is perfectly theoretically possible, but in engineering terms not something anyone would actually do. All enumerations can be iterated.)

This still seems to point towards Rust's enums being both, no?

Example: For a network protocol, see the first code sample I posted.

For a `.Next()`, add the method. (I did this recently)

Regarding sum types into a network not working, this again sounds like the wrapped enums. One way to do this is use an integer for the enum variant at index 0, and conditionally assign bytes of an appropriate size based on the type wrapped for the next set of bytes.

OK, so now you are advocating for erasing the distinctions.

Why? Why is it so important that they be seen as the same thing to you? What benefit is gained from it? What benefit is gained from blending together a data structure that is fixed bit size from a family of data structures of variable size, a fairly fundamental difference? What benefit is gained from failing to consider the fundamentally different uses they are put to? What benefit is gained from looking at someone list a set of differences between the two, and basically saying, "yeah, they're different, but what if not?"

I can name further properties that differ between them. All sum types can embed arbitrary other existing sum types within themselves, without practical limit. Enumerations can not, because A: they may collide on which numbers they use and B: even if you remap them, you can run out of integers, especially with smaller values like byte-sized enumerations. Enumerations may have further structure within themselves, such that particular bits have particular meanings or values a certain number apart may have relationships to each other, or other arithmetic operations can be given some meaning; sum types themselves do not generally have any such relationships. (At least, I've never seen a sum type in two clauses of the sum type are somehow related; that'd be bad design of a sum type anyhow. Even if you did this to an internal integer contained in a sum type, it would be that integer composed in to the sum type that had that relationship, not the sum type.) Sum types have a rich concept of pattern matching that can be applied, enumerations generally do not (some languages can do some pattern matching with bits but there's still no deep structure matching concept in them).

I mean, how many differences are necessary before they are not the same thing? They can not fit into the same amount of memory; one is fixed in size, the other highly variable. One is simple to serialize into memory, the other has lots of complicated machinery. Each has operations generally valid on one but not the other (enumeration, pattern matching, sum type's composition whereas enums can not generally). The range of valid values (or domain, whichever you prefer) is not the same. There are languages that have enumerations without sum types, in that enumerations appeared in mainstream languages decades before sum types were a mainstream conversation. In what other ways could they be different?

It strikes me like arguing that ints and strings are the same, because honestly, what's the difference between 11 and "11" anyhow? Even if you're working in a language that strives to make the distinction as small as possible, you're still going to get in trouble if you believe they really are completely the same thing. And any programmer who goes through like truly thinking 11 and "11" are the same thing is in for a lot of confusion as concepts they should be understanding as separate, even if at times superficially related, are actually the same.

I'm not advocating for anything; I love Rust Enums and use whatever is close to them in other languages, which is usually better than the alternative of matching strings or similar (A convention in Python).

When I hear "These aren't really enums", my first reaction is to dive in and do research. (I'd been down this road before, probably after a similar HN comment...), but I haven't found usable or practical conclusions. It seems like the distinction is too subtle to be of use.

Stated more succinctly, let's call Rust enums "Choices", as I think this is causing semantic trouble. "Choices" are an excellent tool.

I'm looking at this from an engineering perspective; not a CS or abstract mathematics one.

I am curious what your pure Enum, and pure SumDataType look like in practice. I am also curious what existing implementations of either exist. Are they Haskell conventions?

That's because what you insist is the only thing deserving of the name "enum" is just a sum type of unit types.
Sum type are great, but I don't think it fits in go type system.
They also tend to require proper pattern matching to be particularly useful, something which I can't see being added given Go's design philosophy.
True Scotsman spotted!