Hacker News new | ask | show | jobs
by lkitching 1782 days ago
This line:

    maybeCache <- (getCofDirs >>= head >> initializeCache)
is doing exactly what the post is arguing against. getConfDirs is validating the list is non-empty but the [FilePath] list it contains does not encode that information. Now you immediately have to handle the possibility of a missing value from head that you already know cannot happen. This isn't too apparent here since you've combined it into a single expression but if you need to pass the confDirs list to any other part of the program they will also have to continually handle the possibility of the list being empty even though you already checked for that possibility. Now every function that interects with the confDirs list will have to include (Maybe a) in its return type unnecessarily. The post is not suggesting you can remove Maybe entirely but it has moved it to a single point in the program (the point where the config dirs list is checked for emptiness) and removed it everywhere else. Your approach must continually guard against an impossible condition everywhere the dirs list is accessed because you discard the property you checked for in getConfDirs.

The monadic operators make it convenient to propagate missing values through a chain of operations but they are not the primary benefit of an explicit Maybe type. Much like IO, the benefit of having an explicit Maybe type is when you _don't_ have it since its absence represents more information at that point in the program. Likewise a (NonEmpty a) contains more informatation than [a] which consequently makes the implementation of head more informative.

The parsers in this approach have types like

    a -> Maybe b
where type b contains the extra information extracted by the parser. Your getConfDirs function only contains a function with type

    [a] -> Maybe [a]
so isn't parsing in the same way.
1 comments

I understand what the author is doing. I said this earlier but it bears repeating, the author seems to be more concerned with having a concrete type than simpler code. A reference to `Maybe Cache` is good enough (and preferred). The top-level of your program is precisely where you want to have the flexibility to deal with the above.

Furthermore, my example is a much better illustration of the axiom ("Parse don't validate") than what the author is doing -- which is more like "Parse and validate".

You need to clarify "continuously guard". Sure you have to invoke methods like:

    maybeCache >> useCache // map 
instead of:

    maybeCache |> useCache // not sure how Haskell pipes
Is that too difficult? The `Maybe` monad is specifically designed so that you don't have to continuously guard against the possibility of the value not existing. That is, you can "map", "bind" and "apply" functions to the value as if it always exists (and it handles the situation when the value doesn't). I also included a `case` block within which you can be statically certain a value of type `Cache` is available if you really need it.

The purpose of `Maybe` is to simplify code that needs to deal with a value that might not exist. Attempting to organize your code to avoid using `Maybe` is, by definition, going to be more cumbersome than simply leaning into the construct (that's what it's for!). It also better-illustrates how "parse don't validate" should work. Using an exception to guard against an invariant is... validating not parsing.

You don't need to defend the author here. It's just a matter of fact the the code provided could be organized differently according to a more idiomatic usage of `Maybe`, and therefore a more illustrative example of their own point. The choice to exemplify something else is unfortunate and the thrust of this entire comment thread -- I felt like I had to say something now seeing that link a second time.

The author explains what they mean by parsing in the post:

> Really, a parser is just a function that consumes less-structured input and produces more-structured output. By its very nature, a parser is a partial function—some values in the domain do not correspond to any value in the range—so all parsers must have some notion of failure. Often, the input to a parser is text, but this is by no means a requirement, and parseNonEmpty is a perfectly cromulent parser: it parses lists into non-empty lists, signaling failure by terminating the program with an error message.

So the properties checked by the parser are reflected in the output type. Reifying these properties in the type is what allows the validation to be done once at the top level and avoided throughout the rest of the program. Your complaint about throwing exceptions is focusing on an irrelevant detail in a small example - yes this could have been moved into the main function but doesn't affect the overall behaviour.

However your argument that propagating Maybe values is more idiomatic than parsing into a more precise type is one I - and I assume most - static typing advocates would disagree with. Given the choice you would always prefer an 'a' over a 'Maybe a' since a Maybe represents a point of uncertainty which you would rather not have. As a result, having to chain this imprecision using various combinators is inherently more complex than not having to do so. Yes, using bind etc. is preferable to manually destructing Maybe values but avoiding Maybe is more preferable still.

> but avoiding Maybe is more preferable still

You can't avoid `Maybe` in this system. It is in the nature of the problem (as it is designed) that the input might not exist (and therefore a list might be empty). The question isn't one of avoidance, rather, integration. How do we deal with problems like the example?

"Parse don't validate" is a great way to deal with it! Even more convenient is the existence of a tool that can be used to offload all of the redundancy involved when choosing to parse instead of validate (i.e. throw an error).

It is the author's prerogative to value having a concrete value at one specific point in the program (`main`) over demonstrating how using `Maybe` can make parsing a breeze. Clearly you also value (for whatever reason) knowing that a variable contains a value at some specific, rather arbitrary point in the example program[0]. But it is an unfortunate choice given the title of the post.

Not only does the example code in the post not illustrate "parse don't validate" very well, it convolutes the solution considerably. My example above is able to achieve identical behavior in an easier-to-digest flow while also illustrating how parsing instead of validating can be done.

[0] Of course we know that any function to which we `map` to our `maybeCache` will for sure be invoked with an instance of `Cache`.

Your example does not achieve idential behaviour at all since it 'parses' an [a] to another [a] and therefore throws away the very property you've just checked. The (NonEmpty a) property encodes the non-emptiness of the list in the type which is then known at every point the list is accessed throughout the entire rest of the program. The point is not just to check the non-emptiness in main as you appear to be implying. Any use of head on an [a] must continually deal with a (Maybe a) even though this possibility has been ruled out. In contrast NonEmpty.head returns an element directly so removes entires chains of Maybes that would be propagated, conveniently with map/bind or otherwise. Parsing allows to replace N + 1 instances of Maybe with just 1 so you can't claim your approaches are the same just because it hasn't been eliminated entirely.
I think you are confusing implementation with behavior. That is, I am achieving that same result through different means. I am mostly uninterested in specifically how the configuration string is parsed. It's not really important.

What is important is that we know we will have to deal with the possibility of something not existing. That is where the complexity lies, and where we want to take care to make our program as sensible as possible. Validating your input to throw an exception or return is one way to satisfy the compiler, another way is to use `Maybe` as intended. The author's "solution" is simply a poor illustration of parsing over validation (read that sentence again).

I suspect, and this applies to you as well, that they are just not comfortable working with the `Maybe` construct. Adding extra ceremony to remove a `Maybe` is simply not worth the trouble, and your idea of "continuously propagating" is severely overblown. Again, we can write every single line of the rest of our program as if `Cache` exists. You don't need to "handle" anything extra (other than the holding the concept of a slightly more complex value in your mind).

The difference between parsing and validation in the author's formulation is not between returning Maybe and throwing an exception, it's between returning a more precise type and not. Here's the types of the two version of `getConfigurationDirectories`:

    getConfigurationDirectories :: IO [FilePath]
    getConfigurationDirectories :: IO (NonEmpty FilePath)
The second version is preferred because the (NonEmpty FilePath) encodes the property that was checked in the type which means it doesn't have to be handled repeatedly throughout the entire rest of the program.

Yes the second version could have been changed to one of:

    getConfigurationDirectories :: IO (Maybe (NonEmpty FilePath))
    getConfigurationDirectories :: MaybeT IO (NonEmpty FilePath)
but this would only have moved the error reporting up one level to the main function. I would guess the existing version was chosen to simplify the types for a non-Haskell audience.

your attempted 'improvement' of using

     getConfigurationDirectories :: IO (Maybe [FilePath])
is NOT an example of parsing because [FilePath] does not remove the possibility (in the types!) of the list being empty. When you later attempted to use

    maybeCache >>= useCache
this requires the type of useCache to have type

    [FilePath] -> IO a
for some output type a. This function must deal with the possibility of the input list being empty because the type allows it. Every call to `head` returns (Maybe FilePath) and must handle the Nothing case. Neither I nor the author is unaware that there are many combinators that make this more convenient than explicit matching against Just/Nothing but doing so is strictly worse than returning a FilePath directly. Presumably none of the lower-level functions will be able to provide a default FilePath to use so every single one will be forced to return a Maybe somewhere in their return type (or use fromJust which is very ugly). This affects every single one of their callees which will again be forced to propagate Maybe up to their callees etc. To reiterate: the issue is not the possible non-existence of Cache, which can be handled in main. It's that the representation of Cache forces every single operation on it (of which head is just one simple example) to potentially have to represent conditions that should not actually be possible. This is a failure to 'make invalid states unprepresentable', which most proponents of static types aspire to.