Hacker News new | ask | show | jobs
by kingdomcome50 1781 days ago
I think you are confusing implementation with behavior. That is, I am achieving that same result through different means. I am mostly uninterested in specifically how the configuration string is parsed. It's not really important.

What is important is that we know we will have to deal with the possibility of something not existing. That is where the complexity lies, and where we want to take care to make our program as sensible as possible. Validating your input to throw an exception or return is one way to satisfy the compiler, another way is to use `Maybe` as intended. The author's "solution" is simply a poor illustration of parsing over validation (read that sentence again).

I suspect, and this applies to you as well, that they are just not comfortable working with the `Maybe` construct. Adding extra ceremony to remove a `Maybe` is simply not worth the trouble, and your idea of "continuously propagating" is severely overblown. Again, we can write every single line of the rest of our program as if `Cache` exists. You don't need to "handle" anything extra (other than the holding the concept of a slightly more complex value in your mind).

1 comments

The difference between parsing and validation in the author's formulation is not between returning Maybe and throwing an exception, it's between returning a more precise type and not. Here's the types of the two version of `getConfigurationDirectories`:

    getConfigurationDirectories :: IO [FilePath]
    getConfigurationDirectories :: IO (NonEmpty FilePath)
The second version is preferred because the (NonEmpty FilePath) encodes the property that was checked in the type which means it doesn't have to be handled repeatedly throughout the entire rest of the program.

Yes the second version could have been changed to one of:

    getConfigurationDirectories :: IO (Maybe (NonEmpty FilePath))
    getConfigurationDirectories :: MaybeT IO (NonEmpty FilePath)
but this would only have moved the error reporting up one level to the main function. I would guess the existing version was chosen to simplify the types for a non-Haskell audience.

your attempted 'improvement' of using

     getConfigurationDirectories :: IO (Maybe [FilePath])
is NOT an example of parsing because [FilePath] does not remove the possibility (in the types!) of the list being empty. When you later attempted to use

    maybeCache >>= useCache
this requires the type of useCache to have type

    [FilePath] -> IO a
for some output type a. This function must deal with the possibility of the input list being empty because the type allows it. Every call to `head` returns (Maybe FilePath) and must handle the Nothing case. Neither I nor the author is unaware that there are many combinators that make this more convenient than explicit matching against Just/Nothing but doing so is strictly worse than returning a FilePath directly. Presumably none of the lower-level functions will be able to provide a default FilePath to use so every single one will be forced to return a Maybe somewhere in their return type (or use fromJust which is very ugly). This affects every single one of their callees which will again be forced to propagate Maybe up to their callees etc. To reiterate: the issue is not the possible non-existence of Cache, which can be handled in main. It's that the representation of Cache forces every single operation on it (of which head is just one simple example) to potentially have to represent conditions that should not actually be possible. This is a failure to 'make invalid states unprepresentable', which most proponents of static types aspire to.
You have the signature for `useCache` wrong. I defined it above (`Cache -> a`). Notice the concrete type...

I cannot stress this enough. You do not need to remove the possibility of a value not existing in order to compose a simple, coherent program. This is because `Maybe` is designed to handle all of the extra ceremony involved with utilizing such values. You only need to use `>>` (map) instead of `|>` (pipe) when invoking your functions. That is it.

All of the above is really beside the point though, because I am not arguing that one way is necessarily better than the other. I am arguing that the author's post is titled "Parse don't validate", that the perfect construct is right there to exemplify how parsing unstructured data into/through a system can be done, but then the author eschews it in favor of... validation (with what appears to be some tricks to fool the compiler)!

If your guard against an invalid state is to throw an exception you are validating. Attempting to redefine the terms to fit a particular narrative is a distraction that serves no one.

    > Neither I nor the author is unaware that there are many combinators that make this more convenient than explicit matching against Just/Nothing but doing so is strictly worse than returning a FilePath directly
I'd like you to define "strictly worse" here. In order for "strictly worse" to make any sense we would need to define "strictly better" to mean something like: "to have a reference to a variable in this particular scope that is definitely a `FilePath`". But why are variables in this scope (`main`) so important? You can get reference to a `FilePath` directly whenever you need it through a `Maybe`:

    useFilePath :: FilePath -> a

    maybeFilePath <- (getConfigDirs >>= head)

    maybeFilePath >> useFilePath

This is opposed to something like:

    filePath <- getConfigDir // might throw

    filePath |> useFilePath

There is no difference in behavior and only a slight difference in implementation. I suppose if you really really wanted to `print` the value of `FilePath` from `main` (and not some other function), the second version would be preferred (though you could still match in the first version to create a block in main where `FilePath` is statically defined). Pretty arbitrary though.
> You have the signature for `useCache` wrong. I defined it above (`Cache -> a`)

Yes, sorry it's actually the line

    maybeCache <- (getConfDirs >>= head >> initializeCache)
which shows the issue.

> but then the author eschews it in favor of... validation (with what appears to be some tricks to fool the compiler)!

I think the author is pretty clear about how they're using the terms 'validation' and 'parsing' in the post - validation functions do not return a useful value while parsers refine the input type and carry a notion of failure. The first two examples of parsers they give are:

    nonEmpty :: [a] -> Maybe (NonEmpty a)
    parseNonEmpty :: [a] -> IO (NonEmpty a)
you seem to be arguing that parseNonEmpty is validating because it throws an exception instead of returning Maybe (NonEmpty a) but this isn't true here since Maybe signals failure by returning Nothing errors within IO can be signaled with exceptions. The author hints at how these two parser types are related later on with:

    checkNoDuplicateKeys :: (MonadError AppError m, Eq k) => [(k, v)] -> m ()
There are MonadError instances for both IO and Maybe so the general parser type is something like

    MonadError e m => a -> m b
Admittedly this could have been made clearer if it was the intention and returning Maybe is preferable to throwing exceptions in languages like Haksell.

If you were translating this approach to other languages like Java or C# though you proabably would throw exceptions to indicate failure e.g.

    interface Parser<A, B> { B parse(A input) throws ParseException; }
so I don't think your objection holds in general.

> I'd like you to define "strictly worse" here

I'm saying you would always prefer to be handed an instance of an `a` instead of a (Maybe a) since it's more precise. You can trivially construct a (Maybe a) from an a but you can't easily go in the other direction. You either need to produce a dfeault value or use a partial function like fromJust to obtain an 'a' from a 'Maybe a'. The motivation for the post is to show how using a more precise data type allow you to remove these from the rest of the code.

> But why are variables in this scope (`main`) so important

The issue doesn't happen in main, it happens throughout the rest of the program. The high-level structure is something like:

    main :: IO ()
    main = do
      maybeDirs <- getConfigDirs
      maybeDirs >>= restOfProgram
main only has to handle the parse failure and report any errors which will look similar regardless of whether getConfigDirs has type Maybe (NonEmpty FilePath) or IO (NonEmpty FilePath) (and throws an exception). But the representation of the directory list could be used anywhere in restOfProgram. Given a chain of applications fun1 -> fun2 -> ... -> funN, if funN accesses the file list with head and receives a (Maybe FilePath) there are three options:

1. Use fromJust since the list should be non-empty 2. Produce a default value 3. Propagate the Maybe in the return type of funN

Option 1 is messy, 2 is also unlikely for a low-level function and 3 forces fun1 to fun (N - 1) to either handle or propagate the partiality. Yes using >>= and <=< etc. can hide this plumbing but can be made unnecessary in the first place.

    > I'm saying you would always prefer to be handed an instance of an `a` instead of a (Maybe a) since it's more precise.
I disagree with this. `Maybe a` is more precise because it more closely represents the actual system within which we are working. It is simply a fact that our configuration directories might not exist. It is only within the author's own head that they prefer a concrete type because they value being able to point to their variable and say, "look I have this value! It's right here!" in a procedural sense, more than adopting a more functional approach.

    > You can trivially construct a (Maybe a) from an a but you can't easily go in the other direction. You either need to produce a dfeault value or use a partial function like fromJust to obtain an 'a' from a 'Maybe a'
Again, the above is just not accurate! Or it is accurate in a very specific - "I want this particular value in this particular scope" - kind of way. Even in your example, we can be statically certain that `restOfProgram` will receive a value of type `[FilePath]`[0].

This is starting to feel like a waste of time. You are very much hung up on trying to defend the idea that using `Maybe` is something to be avoided. I understand where you are coming from. I really do. But you are simply not going to convince me because I prefer to model systems as a whole and I prefer to avoid doing extra gymnastics to solve already-solved problems. Throwing an exception? C'mon... we both know that example sucks.

My critique of the post really has nothing to do with choosing `Maybe` vs validating. My critique is that the author's code is utterly failing to exemplify parsing over validation! Using `Maybe` to chain parsers together in order to build an input would have been perfect. Unfortunately, they kind of mucked it up halfway through because they appear to be afraid of `Maybe`. It's a shame given that the post seems to have gotten around.

[0] This whole `NonEmpty` non-sense is a sideshow that's not worth discussing (other than to further illustrate how `Maybe` can be used to simplify multi-step parsing). What happens when you need the Nth element? You just keep re-defining the type to include more values? When we get to `NonEmpty6` I think maybe we will have realized we are on the wrong path. For our purposes it's better to think of `[FilePath]` as `Input` and not get bogged down in the specifics of its shape. The important bit is that it might not exist.

The entire point of Maybe is to imbue some type 'a' with an extra value - Nothing - along with a tag about which case you have. So (Maybe a) is always inhabited by more values than 'a', and that is the sense in which a variable of type a is 'more precise' than one of (Maybe a). I'm not saying Maybe is bad in any way - as you point out sometimes you do have to deal with the possibility of not having a value e.g. looking up a key in a map, looking up a user from a database etc. In Haskell there's no 'null' value which inhabits each type so you have to use Maybe, but even in languages like C# or Java where reference types all contain null I would still prefer to use Maybe/Optional to be explicit about the possibility. I don't think we disagree here. But at any point in a program you would always prefer to receive an 'a' over a (Maybe a) if you had the choice since there are fewer cases to deal with. This is the same reason languages like C# are adding support for non-nullable reference types.

Type-driven design is based around encoding invariants as much as is practical in the type system (what constitutes 'practical' is constrained by the type system you're using). The (NonEmpty a) type is just used to demonstrate a very simple example of this principle. In the same way that type 'a' is smaller than the type (Maybe a), so (NonEmpty a) is smaller than a [a] which means the operations on it are similarly more precise, which shows up in the two version of head:

    head :: NonEmpty a -> a
    head :: [a] -> Maybe a
 
But this is just one example - you could replace it with different representations of a user in a web service

    type User {name :: String}
    type User = JsonValue
 
and the consequent difference in the types of the accessor for the name:

    getName :: User -> String}
    getName :: JsonValue -> Maybe String
 
Far from being a 'sideshow' this is the main point of the approach - using a more precise representation makes all the operations on it similarly more precise globally throughout the program.

In your post the argument to restOfProgram has type [FilePath] but in the post it is (NonEmpty FilePath) so you need to handle the potential non-emptiness of the list everywhere you try to access it, either by propagating missing values to a higher level or using 'unsafe' functions like fromJust. It's defensible to prefer using a simpler representation type and dealing with the imprecision, but it's not doing the same thing - the types for a lot of the internals of your program will be quite different. This is probably the main philosophical difference with Clojure which prefers to use a small number of simple types along with dynamically checking desired properties at the point of use, something which tools like spec and schema make quite convenient. But people use static languages because of the global property checking, so it seems odd to me to endorse explicit modelling of missing values with Maybe while rejecting doing the same thing for non-emptiness since they are both lightweight approaches.

The insight of the original post is that if you choose to try make your types precise in this way (and most Haskell programers would I believe) then the process of checking the properties you want to enforce from a less-precise representation is inseperable from the process of converting into the narrower representation. This narrowing process could fail and must therefore encode the representation for the failure case. Your insistence that Maybe should be used as the one true failure representation is wrong I think, throwing exceptions in Haskell is rare but but they could have also chosen (Either String) for example. Maybe isn't even a particularly good representation since it doesn't contain any way of describing the reason for the failure, just that it happened. I agree it would have been nice to see an example of parser composition using <=< etc. would have been useful there but it's not the main point of the article.

I understand the purpose of `Maybe`.

    > In your post the argument to restOfProgram has type [FilePath] but in the post it is (NonEmpty FilePath) so you need to handle the potential non-emptiness of the list everywhere you try to access it
This is what I'm talking about. You are wasting energy on this line of thinking. Sure the author chose to parse a string into a list which then introduces the possibility of that list being empty. But we could have just chosen a different abstraction to hold our configuration that didn't suffer from this problem. Say:

    getConfiguration :: () -> Maybe { cache :: FilePath }
Now it's always non-empty. Don't get stuck on some intermediate representation. Again, I am uninterested in the details of the particular format of some input. My interest (and the thrust of this discussion) is about how to handle an input that might not exist. Specifically in terms of "parsing instead of validating".

    > Your insistence that Maybe should be used as the one true failure representation
I cannot stress this enough (I've said this at least twice now), I am not arguing that `Maybe` is "the one true way". I am arguing that the author is failing to exemplify how to parse your inputs vs. validating them. I am arguing that the code they wrote to help substantiate and illustrate their point about parsing accomplishes no such thing. It actually shows how to validate an input in a way that is confusing and no different than (in TS):

    // returns a non-empty list of string
    getConfigurationDirectories: () => [string, ...string[]] = () => {

         const dirs = getEnv("dirs").split(",");

         if (dirs.length < 1) throw "ERROR";

         return dirs;
    }
The above is not best-understood as a "parser". The above is validating the input. Trying to redefine "parsing" to mean "the result has a different return type" helps no one, and introducing `Maybe` into their example (while on the right track) isn't really necessary because they aren't using the `Maybe` (other than maybe as a crutch to satisfy the compiler).