Hacker News new | ask | show | jobs
by oever 1010 days ago
This library can be used to create string class hierarchies. That, in turn, can help to use typed strings more.

For example, e-mails and urls are a special syntax. Their value space is a subset of all non-empty string which is a subset of all strings.

An e-mail address could be passed into a function that requires a non-empty string as input. When the type-system knows that an e-mail string is a subclass of non-empty string, it knows that an email address is valid.

This library can be used to check the definitions and hierarchy of such string types. The implementation of the hierarchy differs per programming language (subclassing, trait boundaries, etc).

4 comments

In languages with tagged union types you do this a lot! Some Haskell pseudocode for ya

    module Email (Address, fromText, toText) where -- note we do not export the constructor of Address, just the type

    data Address = Address Text

    fromString :: Text -> Maybe Address
    fromString =
        -- you'd do your validation in here and return Nothing if it's a bad address.
        -- Signal validity out of band, not in band with the data.

    toText :: Address -> Text
    toText (Address addr) = addr -- for when you need to output it somewhere
Pedantic note: ‘Address’ should really be a ‘newtype’…
Haha sorry, I get those backwards a lot. I was gonna do elm but then it’d be a conversation about why we’re writing our own email address validation on the front end instead of using the platform.
Don't worry, that's normal -- in this forum we only talk about how good obscure languages are, nobody actually uses Haskell.
I figure you're joshing but I literally write it for work (although I haven't been in our haskell codebase in months, tragically). I just have a particularly smooth brain so I forget all the little differences as soon as I'm done. Always in exams mode.
> Signal validity out of band, not in band with the data.

Could you expand on this?

Sure! Sorry that was a little too obtuse. So in this case we can imagine an app where we don't use any tagged unions and just use primitive types (your strings, booleans, integers, things of that nature). And we want to signal the validity of some data. Say a user ID and an email address. We store the User ID as an integer to keep space down and store the email address as a string. We use semaphore values: if the user ID is invalid we store -1 (it's JS and there are no unsigned numbers) and if the email address is invalid we store the empty string.

Whenever we consume these values, we need to make sure that userId > 0 and email != "" I mean email !== "". We are testing for special values of the data. Data and "this is for sure not meaningful data" are the same shape! So your functions need to handle those cases.

But with tagged unions you can check these things at the edge of the program and thereafter accept that the contents of the tagged data are valid (because you wrote good tests for your decoders).

So your data is a different shape when it's valid vs when it's invalid, and you can write functions that only accept data that's the valid shape. If you got Json that was hit by cosmic rays when trying to build your User model, you can fail right then and not build a model and find a way to handle that.

It's out of band because you don't guard for special values of your morphologically identical data.

If you want examples of any specific part of this let me know. IDK your level of familiarity and don't want to overburden you with things you already get.

>An e-mail address could be passed into a function that requires a non-empty string as input. When the type-system knows that an e-mail string is a subclass of non-empty string, it knows that an email address is valid.

Don't use regex for email address validation

https://news.ycombinator.com/item?id=31092912

Nothing like a dive into the wondrous world of what is and isn't allowed in an email address left of the @ on a warm late-summer morning. It's one of the mysteries of the modern world. The simple heuristic that proposes that every regex trying to express "valid email address" is wrong is a sufficiently safe bet, but it ruins all the fun.
> Their value space...

wossis mean? TIA

Edit: instread of downvoting try answering. I'd like to know. TIA{2}

People are downvoting you because quirky/jokey super-colloquial language like “wossis mean? TIA” is hard to understand, and also just doesn’t really mesh with the vibe of the site.
What does TIA even mean?
Thanks In Advance.
That Is Amazing.
Value space is the set of values a type can have. A boolean has only two values in its value space. An unsigned byte has 256 possible values, so does a signed byte.

A string enumeration has a limited number of values. E.g. type A ("Yes" | "No" | "Maybe") has three values and is a superset of type B ("Yes" | "No"). A function that accepts type A can also accept type B as valid input.

If the value space is defined by a regular expression, as is often the case, the mentioned library could be used to check, at compile-time, which type are subsets of others.

Thank you. I guess I misread.

"For example, e-mails and urls are a special syntax. Their value space..." seemed to talk about the 'value space' of strings (these being e-mails and urls), not types (of e-mails and urls), which confused me.

It is bout the 'value space' of strings. Think of all possible strings. That is the entire value space of strings. Not every possible string is an email. Only a subset of this value space is a valid email. This subset is the 'value space' of strings which are valid emails.
If I hadn't seen your edit, I might have downvoted the comment for not being intelligible.