Hacker News new | ask | show | jobs
by AnimalMuppet 2405 days ago
In what sense are integers and booleans sum types? What is the definition of sum types that booleans fit? Is it just that it can be true or false, and because there's the "or", it's a sum type?
6 comments

> In what sense are integers and booleans sum types?

Natural = Zero | Successor(Natural)

Bool = True | False

The mapping from Natural to Integer is standard.

Assuming non-fixed width integers, how can you have an infinite sum type?
There’s nothing wrong with an infinite sum, in principle. You can’t usually define them unless you can use recursion. In the case of integers you can do

    data Int = Positive Nat | Negative Nat
    data Nat = Zero | Succ Nat
Though, that has 2 zeros, a positive and a negative one, but zero should be neither. To avoid that, you can do:

  data Int = Zero | NonZero Sign PositiveInt
  data PositiveInt = One | Succ PositiveInt
  data Sign = Positive | Negative
Good point.
You can't. The Natural number definition combines a sum type with a recursive position. Or in other words, a finite tag (either zero or successors, 2 options) plus a pointer to the next value in the case of successor.
> Is it just that it can be true or false, and because there's the "or", it's a sum type?

You can rote learn that, but the reason why it's intuitive to call them sum types is because you're adding the number of possible values. With product types, you multiply the number of possible values that the type represents. Consider these types:

  -- 1 + 1 = 2 possible values
  data Bool = True | False

  -- 1 + 1 * a = 1 + a possible values
  data Maybe a = Nothing | Just a

  -- 1 * a + 1 * b = a + b possible values
  data Either a b = Left a | Right b

  -- 1 * a * b possible values
  data Pair a b = Pair a b

  -- 1 * a * b * c possible values
  data Triple a b c = Triple a b c
`Maybe Bool` would have 1 + 2 = 3 possible values.

`Either (Pair Bool Bool) (Maybe Bool)` would have (2 * 2) + (1 + 2) = 7 possible values.

Integers would be defined as:

  data Int = ... | -2 | -1 | 0 | 1 | 2 | ...
if they were finite, but since they're not, they'd have to be defined recursively with something like:

  data PositiveInteger = One | Succ PositiveInteger
  data Integer = Zero | NonZero Bool PositiveInteger
where the Bool represents the sign.

Since 2 * x is the same as x + x, we can also define it as:

  data Integer = Zero | NonZero (Either PositiveInteger PositiveInteger)
where Left of Either would represent negative integers, and Right of Either would represent positive integers.
Summing is the act of gluing two components together with “or”. It’s possible to see all types as sums in this fashion (this is essentially “distributivity of addition over multiplication) but it’s also impossible to see char/int/bool as not having some sense of “or” within it.

For a product type with no “or”, consider the JavaScript object {}. We see these as “atomic”.

A boolean is either true or false - i.e., it's the type inhabited only by "True" plus the type inhabited only by "False". An integer is either 1, or 2, or 3, or 4, or..., i.e., it's the countably infinite sum of other types.
First, here is a note about the notation that I will use here: Let `1` denote the unit type, the type with one inhabitant. Let `()` denote the inhabitant of the unit type. Let `a + b` or `Either a b` denote the sum of types `a` and `b`. Let `Left` and `Right` be the constructors of the sum type.

Before defining the integers, one must define the natural numbers. Natural numbers are defined as an inductive type, which is a little different from an ordinary sum type:

    data Nat = Z | Suc Nat
Here is another way of defining them. Start by removing the self-reference:

    data NatF a = Z | Suc a
Notice that NatF is the same thing as the Maybe/Option type, AKA 1+a, AKA the sum of the unit type and a.

Given a "fix" type:

    data Fix f = MkFix (f (Fix f))
the definition of the natural numbers would be:

    type Nat = Fix NatF
An "F-algebra" is a way of defining some algebraic structure or data type as a map from a "sum of products" to the type. More precisely, an F-algebra consists of an "object" (type) A and a map F(A) -> A. For the natural numbers, the F is NatF and the map would be a function mkNat : NatF Nat -> Nat, AKA mkNat : (Maybe Nat) -> Nat or mkNat : (1 + Nat) -> Nat.

Related links:

- https://ncatlab.org/nlab/show/inductive+type

- https://ncatlab.org/nlab/show/natural+numbers+object

- https://ncatlab.org/nlab/show/initial+algebra+of+an+endofunc...

- https://en.wikipedia.org/wiki/F-algebra

---

One way of defining the integers is as the product of the natural numbers with themselves, or Nat * Nat, equipped with an equivalence relation ~ where

(a, b) ~ (c, d) := a + d = b + c

(a, b) can be understood as a - b. Of course, a - b = c - d <-> a + d = b + c.

Another way of defining the integers is as:

    data Int = Neg Nat | Zero | Pos Nat
where 0 = Zero, -1 = Neg Z, and +1 = Pos Z.

---

Booleans can be defined as 1 + 1. Left () = True and Right () = False, or vice versa.

I'm guessing at least a majority of the readers are like me and don't know Haskell, so on behalf of the confused:

  data Nat = Z | Suc Nat
^ What does this crazy construction mean syntactically?

It looks like it is trying to do something like (from your link) Coq's

  Inductive nat : Type :=
   | zero : nat
   | succ : nat -> nat.
But obviously defining natural numbers in terms of Z/the integers doesn't work because I could then call -1 a natural number.
You can't have negative numbers with that definition. Here are numbers and the way you would write them with that definition:

  0 == Z
  1 == Suc Z
  2 == Suc (Suc Z)
  3 == Suc (Suc (Suc Z))
You can define some arithmetic operators for it:

  data Nat = Z | Suc Nat deriving (Show, Eq)

  instance Num Nat where
    (Suc x) + y = Suc (x + y)
    Z + y = y

    (Suc x) - (Suc y) = x - y
    x - Z = x
    Z - (Suc _) = undefined

    Z * y = Z
    (Suc x) * y = (x * y) + y

    negate = undefined

    abs x = x

    signum (Suc _) = Suc Z
    signum Z = Z

    fromInteger 0 = Z
    fromInteger x
      | x < 0 = undefined
      | otherwise = Suc (fromInteger (x - 1))
The definition `fromInteger` is used by GHC to implicitly convert decimal numbers in the code to our Nat. We can use it like this:

  ghci> 0 == Z
  True
  ghci> 1 == Suc Z
  True
  ghci> 2 == Suc (Suc Z)
  True
  ghci> 3 :: Nat
  Suc (Suc (Suc Z))
  ghci> 3 + 2 :: Nat
  Suc (Suc (Suc (Suc (Suc Z))))
  ghci> 3 - 2 :: Nat
  Suc Z
  ghci> 3 * 2 :: Nat
  Suc (Suc (Suc (Suc (Suc (Suc Z)))))
  ghci> 3 - 4 :: Nat
  *** Exception: Prelude.undefined
  CallStack (from HasCallStack):
    error, called at libraries/base/GHC/Err.hs:78:14 in base:GHC.Err
    undefined, called at <interactive>:70:15 in interactive:Ghci8
In my code, the Z doesn't mean integer; it stands for zero. Nat is an inductive type, and Z is the base case. Z has type Nat.
The “or” makes it a sum type.
Doesn't that make everything (other than "bottom" or "unit") a sum type?
Sure, in the same sense that every set is a disjoint union over singletons of all its values. But whether you would or could naturally specify it that way or not depends on the set. You likely wouldn't sit down and specify the type "Function from natural numbers to Boolean" as a big OR over each possible such function, for example.

But yes, whether something is a sum type or a product type or whatever is not generally so interesting as a predicate about the type in itself. It's a relationship between this type and other types. The same way it is not interesting to say "7 is a sum" (everything is a sum); what's interesting is to observe "7 is the sum of 3 and 4".

[Disclaimer: In the same way that some topological spaces are connected and cannot be decomposed into separate disconnected components, one might sometimes reason that some types are connected and not reasonably represented as a disjoint unions over subtypes. So "Everything is a sum" is not always correct in all contexts. But let's ignore this sort of pedantry for now.]

Here's what's bugging me: Integers are a disjoint union of values, which is (in my non-abstract-algebra mind) something different than a disjoint union of types. That is, something like Maybe Integer is a "sum type" in a completely different way from the way that Integer is a "sum type". To me, this is different enough that a different term should be used.

I'm sure you're going to tell me that I'm not thinking abstractly enough...

There are sums at the type and value level. If I define Color = Red | Blue | Green, that’s not a use of a type which sums types together, but instead just is a type composed of a sum of values.

So a “sum type” can be any type which is the sum of different values. On the other hand, Either is a canonical “sum type (constructor)” which sums two types together (via the summation of their vaues).

If it weren't for finiteness restrictions, you could write the following in Haskell and it'd make a natural number type that acted basically as expected.

data Zero = Zero data One = One data Two = Two ... data NaturalNumber = Zero | One | Two | Three | ...

Whatever you would say about sets, you might as well say the same thing about types; the distinction in terminology doesn't amount to much. The natural numbers are the disjoint union of {0}, {1}, {2}, {3}, ... . They're also the disjoint union of the even naturals and the odd naturals, or the disjoint union of {0}, {1}, primes, and composites. There are a million ways in which they are a disjoint union.

Whoops, all the newlines were lost here, oh well.
I don't think it's your level of abstraction, I think it's that you are treating as unrelated two sides of the same coin. This sum type / variant / disjoint union / coproduct construction operates both at the level of types and the level of values. After all, we only care about types insofar as they tell us about values.
So Integer (0 or 1 or 2 or...) is a sum type in the same sense that Maybe Integer (Nothing or 0 or 1 or 2 or...) is. I see. That makes some sense.

So if I'm in Haskell, say, and I've got a Maybe Integer, and I'm trying to walk through the options (pattern matching), I usually see the options as Nothing or Just Integer. Can I instead do Nothing or Just 0 or... "Just some other Integer"? That is, can I treat... um, not sure how to say this in light of your post... "individual values of a particular named type" the same way as I can treat "different named types"?