Hacker News new | ask | show | jobs
by alextheparrot 2102 days ago
Can you give an example of real world data modeling where you want more expressive sum types over just using enums? Enums are technically a subclass of sum types, but even those are non-trivial to use at a data format level (Try evolving them in an on-the-wire message format like Avro or Protobuf).
3 comments

Imagine a system that allows third-party login (Facebook / AppleID / whatever) - then the account has either a username/password hash or an oauth token or some other kind of structured data.

Delivery addresses for a system that supports both physical and digital products - you want a type-level distinction between physical and digital addresses, but an order might be being shipped to either.

Subscription vs free trial - they're different kinds of thing, but you want to store more details (e.g. expiry date) than just an enum of one or the other.

>Imagine a system that allows third-party login (Facebook / AppleID / whatever) - then the account has either a username/password hash or an oauth token or some other kind of structured data.

What I've seen most often is you have to deal with account merging but lets say you do really want either/or...

Wouldn't you just have your third party tables (each with their own idiosyncrasies) and in your user table you'd have login_type and login_id columns? You know which table to hit by type using the id?

> Wouldn't you just have your third party tables (each with their own idiosyncrasies) and in your user table you'd have login_type and login_id columns? You know which table to hit by type using the id?

You can do that but it's a bodge. E.g. you won't be able to have the foreign key constraints you'd normally have on that login_id column. And good luck writing a query that actually does something differently for each case - you'd have to do something like multiple left joins and there's no way to check you've handled all the cases and not done one of them twice.

You can have a table or view of all the ids across your implementations and FK into that.

As for how you would model it in the application, in this case you can just normalize across all the possible columns and have the orm build out your mapped object.

> You can have a table or view of all the ids across your implementations and FK into that.

That's just moving the problem around. That table would have to have a bunch of nullable columns, and there's no way to express the constraint that either these columns are non-null or those columns are non-null, or these columns are non-null when this enum has this value.

I'm guessing you've never seriously used a language with first-class sum types. Yes, you can use hacks to represent sum-typed data in languages that don't have proper sum types, but it's always going to be a hack. It's like saying C has OO support because you can always construct virtual function tables by hand.

How do you model "postal address"? Some postal addresses are PO Boxes, some are street addresses, etc. There are canonical representations of these different cases. Do we just shove it all in a string, and let the application perform domain validation?
Don't even try to do validation on postal addresses. The postal system has so many corner cases that you'll never be able to correctly handle all of them. Every mishandled corner case will cost you, or your counter-party, time and money.

Just dump addresses into a unicode string and let the postal system figure it out.

postal address is one of those cases where you probably do just want to shove it all in a string as most structural constraints eventually backfire - especially if you support international: http://www.columbia.edu/~fdc/postal/

the most common schema I've seen is usually something like line1, line2, line3, city, state, country, zip, etc. if it's a reporting database then city/state/country/zip is often mashed into some sort of location id.

Watch out for `state` as well, please don't make it mandatory like so many websites. No, a region is not the equivalent of a state in France, you don't need it for my package to get there!
Each type of postal address is a separate column. New postal address "types" would get new columns. This works particularly well when addresses can have both PO boxes as well as street addresses. This is actually more flexible than tagged unions/sum types, at least for this particular case.
So a PO Box address or a house address would be concatenated into a string value and then stored in either Addresses.POBoxAdress or Addresses.House? It’s still not structured. How can someone easily get the postal-code/zip-code?
I think they may mean that the result set has elements of different types. For instance if you stored restaurants by genre but wanted a list of all restaurants, but retaining all of the unique fields, you currently need to generate the product type of the genres.