Hacker News new | ask | show | jobs
by cubefox 32 days ago
Most people don't know that there are fundamentally two different kinds of "union" types: "tagged unions" and "untagged unions". Now .NET is introducing tagged unions, but unfortunately they stick to the popular tradition of calling them just "unions", which greatly adds to the confusion.

To clarify, these tagged unions are fundamentally different from the untagged unions that can be found in languages like Typescript or Scala 3.

Tagged unions (also called "discriminated unions" or "sum types") are algebraic data types. They act as wrappers, via special constructors, for two (or more) disjoint types and values. So a tagged union acts like a bag that has two (or more) labelled ("tagged") slots, where each slot has exactly one type and exactly one of these slot can take a value.

Untagged unions are set theoretic (rather than algebraic) data types. They don't require wrapping via a special constructor. They behave like the logical OR. They are not disjoint slots of a separate construct. A variable or function x with the type "Foo | Bar" can be of type Foo, or Bar, or both. To access a Foo method on x, one has to first perform a typecheck for Foo, otherwise the compiler will refuse the method call (since x might only have the type Bar which would produce an exception). If a variable is of type A, it is also of type A|B ("A or B"). There are also intersection types (A&B) which indicate that something has both types rather than at least one, and complement/negation types (~A indicates that something is of any type except A). Though the latter are not implemented in any major language so far.

2 comments

These are not what are commonly called tagged unions. It's actually closer to an untagged union.

The C# union does not store any discriminator. Just look at the implementation - it's a single `object?` field.

The discriminative part is handled by run-time type information, which is stored in the object itself, not the union - which is why the C# built-in implementation requires boxing.

Also, you can use the same class/record type ("discriminator") in several different unions - again, a feature which ADTs/sum-types/tagged unions in most functional languages do not have.

You can even store one single object (ie. identical by reference equality) in several different union values at the same time, theoretically, which in combination with mutability is... uhh, certainly not common functional/mathematical semantics.

Underlyingly, all unions are the same concept, exposed by C's `union` keyword, namely overlapping memory (as opposed to `struct`s, which are sequential memory). You can add a discriminator tag to your unions, your call. If it is impossible for there to be ambiguity (e.g. you track the state somewhere else, and you're dealing with a large array), it may be redundant / memory inefficient to tag every item.

Having these tags be a language construct is just a DX feature on top of unions. A very handy one, but it doesn't make tagged and untagged unions spring from different theoretical universes. I enjoy the ADT / set-theoretic debate as much as the next PL nerd, but theory ought to conform to reality, not vice versa.

That's like saying Haskell is really an imperative language just because it's written in C. But it doesn't matter how it is implemented, it matters how it behaves on the surface.
> That's like saying Haskell is really an imperative language just because it's written in C

I honestly don't even remotely understand how you got to that from what I wrote.

Unions already have a definition. You cannot go around claiming that technically unions are not unions (and that "most people don't know" this), because theory X or Y overloads the term union to refer to some unrelated concept. That's fine and well and good, within that theory, but it does not displace the usual CS definition of union. It is a load-bearing definition.

A tagged union is so called because it is a union (overlapping memory) with a field (a tag) containing a discriminator (aka discriminated union). An untagged union is a union without this tag. These terms didn't come out of nowhere. They are transparent descriptors of what's happening.

It doesn't matter whether Haskell is implemented in C or Ruby, and it doesn't matter whether it's functional, object-oriented, or a DSL for modding Elden Ring, none of this changes what a union is. I'm very supportive of type theories exploring the option space for type constructs, but someone's going to have to implement these somehow. Such as with (tagged) unions. You know, that CS concept we have, with a settled and well understood meaning.

To be fair to you, I think you're letting a very narrow definition of 'union' shadow a much more usual and common definition of the word. And that's totally fine if that's the definition pertinent to you and your work, but if that's the case, maybe don't go around pontificating about how most people do not understand unions?

> You cannot go around claiming that 'technically' unions are not unions

I never said this. I said that there are two different kinds of unions with quite different definitions and behaviors.

> A tagged union is so called because it is a union (overlapping memory) with a field (a tag) containing a discriminator (aka discriminated union). An untagged union is a union without this tag.

That alone doesn't sufficiently describe the "unions" in set theoretic type systems like TypeScript. As I said, these unions are not disjoint and don't involve the special constructor that tagged unions/discriminated unions/sum types in algebraic type systems do. They also have logical implications like A implying A|B. Or A|A being equivalent to A, which isn't the case for discriminated/tagged unions. Maybe they are implemented similarly under the hood, but that doesn't negate the difference.

Edit: It's probably fair to say that "untagged unions" in languages like TypeScript, Ceylon or Scala 3 (set theoretic type systems) are different from "untagged unions" in C. It only adds to the confusion...

> Maybe they are implemented similarly under the hood

It's hard to know what to say to this. Tagged unions are not tagged unions, even if they are literal tagged unions? What?

I reiterate what I said in my previous comment, you're not using the ordinary definition of the term union, and this is causing confusion. A union may or may not be a "union" as understood within various academic type theories, that really depends on how any given theory defines that word, which can be any way it wants. But a union is a CS concept with a clearly understood meaning, and when used without added context to suggest it is to be interpreted in some theoretical way, it is understood in that ordinary way.

OP's article is clearly using 'union' to mean tagged unions - he even shows off their implementation, with a tag. The author assumes that his audience will understand what he's talking about when he uses the word union, and it's not causing anyone trouble in this comments section. The fact that alternative definitions within various theoretical paradigms is very nice, bless their hearts, but not really relevant.

You may prefer other definitions to the usual CS definition, that's certainly your prerogative, but - again - that's hardly grounds for taking an article and comment section that's using the commonplace meaning, and appearing to lecture others for failing to adhere to your idiosyncratic standards for what a union must be.

See my edit above. There is the C terminology, and the TypeScript et al terminology, and it's the same "untagged" terminology for two different things. In any case, both these kinds of "untagged" unions are different from tagged/discriminated unions. So just calling them "unions" is ambiguous at best and confusing at worst.