Hacker News new | ask | show | jobs
by kingdomcome50 1782 days ago
> but avoiding Maybe is more preferable still

You can't avoid `Maybe` in this system. It is in the nature of the problem (as it is designed) that the input might not exist (and therefore a list might be empty). The question isn't one of avoidance, rather, integration. How do we deal with problems like the example?

"Parse don't validate" is a great way to deal with it! Even more convenient is the existence of a tool that can be used to offload all of the redundancy involved when choosing to parse instead of validate (i.e. throw an error).

It is the author's prerogative to value having a concrete value at one specific point in the program (`main`) over demonstrating how using `Maybe` can make parsing a breeze. Clearly you also value (for whatever reason) knowing that a variable contains a value at some specific, rather arbitrary point in the example program[0]. But it is an unfortunate choice given the title of the post.

Not only does the example code in the post not illustrate "parse don't validate" very well, it convolutes the solution considerably. My example above is able to achieve identical behavior in an easier-to-digest flow while also illustrating how parsing instead of validating can be done.

[0] Of course we know that any function to which we `map` to our `maybeCache` will for sure be invoked with an instance of `Cache`.

1 comments

Your example does not achieve idential behaviour at all since it 'parses' an [a] to another [a] and therefore throws away the very property you've just checked. The (NonEmpty a) property encodes the non-emptiness of the list in the type which is then known at every point the list is accessed throughout the entire rest of the program. The point is not just to check the non-emptiness in main as you appear to be implying. Any use of head on an [a] must continually deal with a (Maybe a) even though this possibility has been ruled out. In contrast NonEmpty.head returns an element directly so removes entires chains of Maybes that would be propagated, conveniently with map/bind or otherwise. Parsing allows to replace N + 1 instances of Maybe with just 1 so you can't claim your approaches are the same just because it hasn't been eliminated entirely.
I think you are confusing implementation with behavior. That is, I am achieving that same result through different means. I am mostly uninterested in specifically how the configuration string is parsed. It's not really important.

What is important is that we know we will have to deal with the possibility of something not existing. That is where the complexity lies, and where we want to take care to make our program as sensible as possible. Validating your input to throw an exception or return is one way to satisfy the compiler, another way is to use `Maybe` as intended. The author's "solution" is simply a poor illustration of parsing over validation (read that sentence again).

I suspect, and this applies to you as well, that they are just not comfortable working with the `Maybe` construct. Adding extra ceremony to remove a `Maybe` is simply not worth the trouble, and your idea of "continuously propagating" is severely overblown. Again, we can write every single line of the rest of our program as if `Cache` exists. You don't need to "handle" anything extra (other than the holding the concept of a slightly more complex value in your mind).

The difference between parsing and validation in the author's formulation is not between returning Maybe and throwing an exception, it's between returning a more precise type and not. Here's the types of the two version of `getConfigurationDirectories`:

    getConfigurationDirectories :: IO [FilePath]
    getConfigurationDirectories :: IO (NonEmpty FilePath)
The second version is preferred because the (NonEmpty FilePath) encodes the property that was checked in the type which means it doesn't have to be handled repeatedly throughout the entire rest of the program.

Yes the second version could have been changed to one of:

    getConfigurationDirectories :: IO (Maybe (NonEmpty FilePath))
    getConfigurationDirectories :: MaybeT IO (NonEmpty FilePath)
but this would only have moved the error reporting up one level to the main function. I would guess the existing version was chosen to simplify the types for a non-Haskell audience.

your attempted 'improvement' of using

     getConfigurationDirectories :: IO (Maybe [FilePath])
is NOT an example of parsing because [FilePath] does not remove the possibility (in the types!) of the list being empty. When you later attempted to use

    maybeCache >>= useCache
this requires the type of useCache to have type

    [FilePath] -> IO a
for some output type a. This function must deal with the possibility of the input list being empty because the type allows it. Every call to `head` returns (Maybe FilePath) and must handle the Nothing case. Neither I nor the author is unaware that there are many combinators that make this more convenient than explicit matching against Just/Nothing but doing so is strictly worse than returning a FilePath directly. Presumably none of the lower-level functions will be able to provide a default FilePath to use so every single one will be forced to return a Maybe somewhere in their return type (or use fromJust which is very ugly). This affects every single one of their callees which will again be forced to propagate Maybe up to their callees etc. To reiterate: the issue is not the possible non-existence of Cache, which can be handled in main. It's that the representation of Cache forces every single operation on it (of which head is just one simple example) to potentially have to represent conditions that should not actually be possible. This is a failure to 'make invalid states unprepresentable', which most proponents of static types aspire to.
You have the signature for `useCache` wrong. I defined it above (`Cache -> a`). Notice the concrete type...

I cannot stress this enough. You do not need to remove the possibility of a value not existing in order to compose a simple, coherent program. This is because `Maybe` is designed to handle all of the extra ceremony involved with utilizing such values. You only need to use `>>` (map) instead of `|>` (pipe) when invoking your functions. That is it.

All of the above is really beside the point though, because I am not arguing that one way is necessarily better than the other. I am arguing that the author's post is titled "Parse don't validate", that the perfect construct is right there to exemplify how parsing unstructured data into/through a system can be done, but then the author eschews it in favor of... validation (with what appears to be some tricks to fool the compiler)!

If your guard against an invalid state is to throw an exception you are validating. Attempting to redefine the terms to fit a particular narrative is a distraction that serves no one.

    > Neither I nor the author is unaware that there are many combinators that make this more convenient than explicit matching against Just/Nothing but doing so is strictly worse than returning a FilePath directly
I'd like you to define "strictly worse" here. In order for "strictly worse" to make any sense we would need to define "strictly better" to mean something like: "to have a reference to a variable in this particular scope that is definitely a `FilePath`". But why are variables in this scope (`main`) so important? You can get reference to a `FilePath` directly whenever you need it through a `Maybe`:

    useFilePath :: FilePath -> a

    maybeFilePath <- (getConfigDirs >>= head)

    maybeFilePath >> useFilePath

This is opposed to something like:

    filePath <- getConfigDir // might throw

    filePath |> useFilePath

There is no difference in behavior and only a slight difference in implementation. I suppose if you really really wanted to `print` the value of `FilePath` from `main` (and not some other function), the second version would be preferred (though you could still match in the first version to create a block in main where `FilePath` is statically defined). Pretty arbitrary though.
> You have the signature for `useCache` wrong. I defined it above (`Cache -> a`)

Yes, sorry it's actually the line

    maybeCache <- (getConfDirs >>= head >> initializeCache)
which shows the issue.

> but then the author eschews it in favor of... validation (with what appears to be some tricks to fool the compiler)!

I think the author is pretty clear about how they're using the terms 'validation' and 'parsing' in the post - validation functions do not return a useful value while parsers refine the input type and carry a notion of failure. The first two examples of parsers they give are:

    nonEmpty :: [a] -> Maybe (NonEmpty a)
    parseNonEmpty :: [a] -> IO (NonEmpty a)
you seem to be arguing that parseNonEmpty is validating because it throws an exception instead of returning Maybe (NonEmpty a) but this isn't true here since Maybe signals failure by returning Nothing errors within IO can be signaled with exceptions. The author hints at how these two parser types are related later on with:

    checkNoDuplicateKeys :: (MonadError AppError m, Eq k) => [(k, v)] -> m ()
There are MonadError instances for both IO and Maybe so the general parser type is something like

    MonadError e m => a -> m b
Admittedly this could have been made clearer if it was the intention and returning Maybe is preferable to throwing exceptions in languages like Haksell.

If you were translating this approach to other languages like Java or C# though you proabably would throw exceptions to indicate failure e.g.

    interface Parser<A, B> { B parse(A input) throws ParseException; }
so I don't think your objection holds in general.

> I'd like you to define "strictly worse" here

I'm saying you would always prefer to be handed an instance of an `a` instead of a (Maybe a) since it's more precise. You can trivially construct a (Maybe a) from an a but you can't easily go in the other direction. You either need to produce a dfeault value or use a partial function like fromJust to obtain an 'a' from a 'Maybe a'. The motivation for the post is to show how using a more precise data type allow you to remove these from the rest of the code.

> But why are variables in this scope (`main`) so important

The issue doesn't happen in main, it happens throughout the rest of the program. The high-level structure is something like:

    main :: IO ()
    main = do
      maybeDirs <- getConfigDirs
      maybeDirs >>= restOfProgram
main only has to handle the parse failure and report any errors which will look similar regardless of whether getConfigDirs has type Maybe (NonEmpty FilePath) or IO (NonEmpty FilePath) (and throws an exception). But the representation of the directory list could be used anywhere in restOfProgram. Given a chain of applications fun1 -> fun2 -> ... -> funN, if funN accesses the file list with head and receives a (Maybe FilePath) there are three options:

1. Use fromJust since the list should be non-empty 2. Produce a default value 3. Propagate the Maybe in the return type of funN

Option 1 is messy, 2 is also unlikely for a low-level function and 3 forces fun1 to fun (N - 1) to either handle or propagate the partiality. Yes using >>= and <=< etc. can hide this plumbing but can be made unnecessary in the first place.

    > I'm saying you would always prefer to be handed an instance of an `a` instead of a (Maybe a) since it's more precise.
I disagree with this. `Maybe a` is more precise because it more closely represents the actual system within which we are working. It is simply a fact that our configuration directories might not exist. It is only within the author's own head that they prefer a concrete type because they value being able to point to their variable and say, "look I have this value! It's right here!" in a procedural sense, more than adopting a more functional approach.

    > You can trivially construct a (Maybe a) from an a but you can't easily go in the other direction. You either need to produce a dfeault value or use a partial function like fromJust to obtain an 'a' from a 'Maybe a'
Again, the above is just not accurate! Or it is accurate in a very specific - "I want this particular value in this particular scope" - kind of way. Even in your example, we can be statically certain that `restOfProgram` will receive a value of type `[FilePath]`[0].

This is starting to feel like a waste of time. You are very much hung up on trying to defend the idea that using `Maybe` is something to be avoided. I understand where you are coming from. I really do. But you are simply not going to convince me because I prefer to model systems as a whole and I prefer to avoid doing extra gymnastics to solve already-solved problems. Throwing an exception? C'mon... we both know that example sucks.

My critique of the post really has nothing to do with choosing `Maybe` vs validating. My critique is that the author's code is utterly failing to exemplify parsing over validation! Using `Maybe` to chain parsers together in order to build an input would have been perfect. Unfortunately, they kind of mucked it up halfway through because they appear to be afraid of `Maybe`. It's a shame given that the post seems to have gotten around.

[0] This whole `NonEmpty` non-sense is a sideshow that's not worth discussing (other than to further illustrate how `Maybe` can be used to simplify multi-step parsing). What happens when you need the Nth element? You just keep re-defining the type to include more values? When we get to `NonEmpty6` I think maybe we will have realized we are on the wrong path. For our purposes it's better to think of `[FilePath]` as `Input` and not get bogged down in the specifics of its shape. The important bit is that it might not exist.