| The reason that the Monad structure is interesting, in my view, is that it's simple and neatly captures the notion of a "computation" that we can compose in different ways. The core useful operator for monads in Haskell is >>=. It has the following type: m a -> (a -> m b) -> m b
If you squint, this is like function application with a few extra m's thrown in. Here's normal function application for comparison: a -> (a -> b) -> b
So what is this extra m useful for? It's like a hole where we get to plug in some custom logic. In a sense, it lets us change what it means to "apply" a "function". This turns out to be useful for a whole bunch of things, not just IO. (Honestly, from a pedagogical standpoint, I think IO is a bit of a distraction!)For example, take the Maybe type. It's Haskell's nullable: a Maybe a means you either have an a or Nothing. data Maybe a = Just a | Nothing
Remembering the signature of >>= above, what's the most natural way to implement "application"? If we break the function into cases, it becomes pretty straightforward. Here's the specialized function signature we want to implement: (>>=) :: Maybe a -> (a -> Maybe b) -> Maybe b
x >>= f = ...
If x is Nothing then we don't have anything to pass into f so the whole result has to be Nothing. If x has a value, we can just get that value out and pass it into f normally, returning the final result. Nothing >>= f = Nothing
Just x >>= f = f x
(In case you're not familiar with Haskell syntax, the above is actually a valid definition of (>>=) for Maybe!)For Maybe, being a monad gives us a standard way of working with values while automatically dealing with Nothing. It abstracts over repetitive null checking and lets us easily build up Maybe values based on other Maybe values. Other examples of monads are the same in spirit. The list monad, for example, lets us handle any number of inputs in a way that's similar to Maybe. The State monad similarly lets us combine values while carrying along an implicit state internally. The rest of the Monad structure (namely the return function and the laws) are just a formal way of codifying behavior behavior that's already intuitive. So how does this all apply to doing input and output? Well, the problem in Haskell is that it's a language of evaluating expressions at heart: executing effects makes no sense any more than it would in arithmetic. To work with effects we instead have a special, opaque type IO; normal expressions get evaluated to IO actions that can be run to produce the desired effect—namely the IO type. Critically, the IO type does not have to be a monad. It could be completely self-contained and have custom functions for doing one action after the other. We could imagine something like: after :: IO a -> IO b -> IO b
which would let you run an IO statement then run a second one and only return the value of the last one. It's like an imperative block of code!However, we would also like some way of using the results of an IO statement, perhaps assigning them to a name. We can't do this normally because the IO statements are run separately from expression evaluation. We'd have to have some sort of function that could take an IO value, unwrap it and do something with it. And how would we express an interface like this? With a normal function! doSomething :: IO a -> (a -> IO b) -> IO b
Hey, doesn't that look familiar? It's exactly (>>=)!I'm hand-waving a bit again, but the rest of the monad structure comes up when you try to make sure after behaves consistently and intuitively. So IO being a monad emerges naturally from the desire to be able to compose actions and depend on their results in a way that's separate from normal variable bindings and expression evaluation. The causation here is important: it's not that IO is a monad, but rather the IO type (which could exist on its own) happens to naturally and usefully form a monad. But it does a lot of other things too, including some specific capabilities (like spawning threads) that are hard to generalize. So my point, I suppose, is twofold: monads are useful for combining some notion of computation and IO happens to be an interesting example, but the fact that we wrap statements with external effects in a custom type called IO does not inextricably depend on the idea of a monad. Did that explanation help? I wrote a blog post on a similar topic that might be interesting too: http://jelv.is/blog/Haskell-Monads-and-Purity |
You know what else doesn't make sense in arithmetic? Computation. I think it is important to wonder how come mathematical definitions of computation didn't come about until the 20th century[1] even though math has had most of its big breakthroughs (to date) by then (or, at least, there was no revolution in mathematics in the 20th century on the same scale as there was in physics). And yet, as advanced as math was, computation was only defined very late in the game. The reason is that, perhaps ironically, the concept of computation is absent from most of math, which is "equational".
But computation is the very core of computer science. So, for example, in arithmetic, 4 and 2+2 are the same thing because 4 = 2 + 2. So in arithmetic, equality means "sameness". But in computer science they are most certainly not the same, because the process of moving from one of the representation to the other is precisely what a computation does. Not only that, the process isn't even necessarily symmetrical, as moving in one direction may entail a much higher computational complexity than going in the other direction.
Any distinction between calculations and effect (other than actual IO) is, therefore, arbitrary. Whether or not the particular distinctions Haskell makes (where waiting for one second may or may not be an effect depending on how it is implemented) in the form Haskell makes it -- using pure functions and equational reasoning and describing effects with monads -- is actually useful and beneficial is a question for psychologists. Given Haskell's design, monads are certainly useful in Haskell, but that utility can't be generalized to languages with other designs.
[1]: https://en.wikipedia.org/wiki/Computability_theory