Hacker News new | ask | show | jobs
by zoomerang 4269 days ago
Not really. The point of Haskell is not to avoid having side effects. The point of Haskell is to allow code to be referentially transparent - this makes it both easier to reason about as a developer, and easier for the runtime to optimise.
1 comments

Yes, that is altar upon which you sacrifice the abilitity to write print statements to do debugging, and lose the ability to reason about order of execution. But does it really result in more performant code? Every benchmark I've ever seen, more practical languages like Ocaml have come out on top.
> that is altar upon which you sacrifice the abilitity

I see statements like this all the time from people that either fundamentally misunderstand Haskell, and use to have the same misunderstandings myself. You really don't sacrifice anything by using it.

> the abilitity to write print statements to do debugging

I can slap a `trace` statement wherever the fuck I want inside my Haskell code for debugging. Even inside a pure function, no IO monad required. If I want to add a logger to my code, a 'Writer' monad is almost completely transparent, or I can cheat and use unsafePerformIO.

> and lose the ability to reason about order of execution.

If I'm writing pure code, then order of execution is irrelevant. It simply does not matter. If I'm writing impure code, then I encode order of execution by writing imperative (looking) code using do-notation, and it looks and works just like it would in any imperative language.

> But does it really result in more performant code

Haskell has really surprised me with its performance. I've only really been using it for a short time, having been on the Java bandwagon for a long time.

One example I had recently involved loading some data from disk, doing some transforms, and spitting out a summary. For shits and giggles, we wrote a few different implementations to compare.

Haskell won, even beating the reference 'C' implementation that we thought would have been the benchmark with which to measure everything else, and the Java version we thought we'd be using in production.

Turns out that laziness, immutability, and referential transparency really helped this particular case.

- Laziness meant that a naively written algorithm was able to stream the data from disk and process it concurrently without blocking. Other implementations had separate buffer and process steps (Even if hidden behind BufferedInputStream) that blocked the CPU while loading the next batch of data

- Immutability meant that the Haskell version could extract sections of the buffer for processing just by returning a new ByteString pointer. Other versions needed to copy the entire section into a new buffer, wasting CPU cycles, memory bandwidth, and cache locality.

- Referential transparency meant that we could trivially run this over multiple cores without additional work.

Naturally, a hand-crafted C version would almost certainly be faster than this - but it would have required a lot more effort and a more complex algorithm to do the same thing. (Explicit multi-threading, a non-standard string library, and a lot of juggling to keep the CPU fed with just the right amount of buffer).

On a per-effort basis, Haskell (From my minimal experience) seems to be one of the more performant languages I've ever used. (That is to say, for a given amount of time and effort, Haskell seems to punch well above its weight. At least for the few things I've used it for so far).

I'm still of the impression that well written C (or Java) will thoroughly trounce Haskell overall, but GHC will really surprise you sometimes.

I haven't used OCaml much - but my understanding is that the GIL makes it quite difficult to write performant multi-threaded code, something that Haskell makes almost effortless.

> Haskell won, even beating the reference 'C' implementation

This has always interested me. I have never gotten an answer, and I suppose I can't seriously expect one now, but I am still compelled to ask:

Why did you put C in quotes up there? Why isn't Haskell in quotes? You didn't put C in quotes in other parts, but that isn't what I'm talking about.

No specific reason really. I didn't think about it at the time, that's just how I typed it.

Probably because C is a single letter, and thus potentially needs some differentiation from the surrounding sentence, whereas Haskell is an actual word. But no idea really.

Thanks for your answer.
because 'C' is a char, while Haskell is a string
But then there should be double quotes.
> You really don't sacrifice anything by using it

What's "it" - Haskell, or referential transparency? Referential transparency definitely has its victims, and debugging is one of them. Debug.Trace is quite useful, and also violates referential transparency. That Haskell provides it is an admission that strict R.T. is unworkable.

> If I'm writing pure code, then order of execution is irrelevant. It simply does not matter. If I'm writing impure code, then I encode order of execution by writing imperative (looking) code using do-notation, and it looks and works just like it would in any imperative language..

Baloney! Haskell's laziness makes the order of execution highly counter-intuitive. Consider:

    import Data.Time.Clock
    main = do
	start <- getCurrentTime
	fact <- return $ product [1..50000]
	end <- getCurrentTime
	putStrLn $ "Computed product " ++ (show fact) ++
	            "in " ++ (show $ diffUTCTime end start) ++ " seconds"
This program appears to time a computation of 50000 factorial, but in fact it will always output some absurdly short time. This is because the true order of execution diverges greatly from what the program specifies in the do-notation. This has nothing to do with purity; it's a consequence of laziness.

> Turns out that laziness, immutability, and referential transparency really helped this particular case

I don't buy it. In particular, laziness is almost always a performance loss, which is why a big part of optimizing Haskell programs is defeating laziness by inserting strictness annotations.

> Laziness meant that a naively written algorithm was able to stream the data from disk and process it concurrently without blocking

This would seem to imply that Haskell will "read ahead" from a file. Haskell does not do that.

> Immutability meant that the Haskell version could extract sections of the buffer for processing just by returning a new ByteString pointer. Other versions needed to copy the entire section into a new buffer

Haskell returns a new pointer to a buffer, while other versions need to copy into a new buffer? This is nonsense.

Like laziness, immutability is almost always a performance loss. This is why ghc attempts to extract mutable values from immutable expressions, e.g. transform a recursive algorithm into an iterative algorithm that modifies an accumulator. This is also why tail recursive functions are faster than non-tail-recursive functions!

> Referential transparency meant that we could trivially run this over multiple cores without additional work

It is not especially difficult to write a referentially transparent function in C. Haskell gives you more confidence that you have done it right, but that measures correctness, not performance.

Standard C knows nothing of threads, while Haskell has some nice tools to take advantage of multiple threads. So this is definitely a point for Haskell, compared to standard C. But introduce any modern threading support (like GCD, Intel's TBB, etc.), and then the comparison would have been more even.

When it comes to parallelization, it's all about tuning. Haskell gets you part of the way there, but you need more control to achieve the maximum performance that your hardware is capable of. In that sense, Haskell is something like Matlab: a powerful prototyping tool, but you'll run into its limits.

"This is because the true order of execution diverges greatly from what the program specifies in the do-notation. This has nothing to do with purity; it's a consequence of laziness."

Of course, that's not what the do notation specifies, but I agree that's somewhat subtle. As you say, it's a consequence of laziness. Replacing "return" with "evaluate" fixes this particular example.

In general, if you care about when some particular thing is evaluated - and for non-IO you usually don't - an IO action that you're sequencing needs to depend upon it. That can either be because looking at the thing determines which IO action is used, or it can be added artificially by means of seq (or conceivably deepSeq, if you don't just need WHNF).

First up - I'll preface my reply below with a big disclaimer that I'm a relative notice with Haskell, so these are purely my opinions at this point in my learning curve.

> What's "it" - Haskell, or referential transparency? Referential transparency definitely has its victims, and debugging is one of them. Debug.Trace is quite useful, and also violates referential transparency. That Haskell provides it is an admission that strict R.T. is unworkable.

I'd disagree that this is any real attack on the merits of referential transparency, since Debug.Trace is not part of application code. It violates referential transparency in the same way an external debugger would. It's an out of band debugging tool that doesn't make it into production.

> Baloney! Haskell's laziness makes the order of execution highly counter-intuitive. Consider

I wouldn't say it makes order of execution highly counter-intuitive, and your above example is pretty intuitive to me. But expanding your point, time and space complexity can be very difficult to reason about - so I'll concede that's really a broader version of your point.

> Haskell returns a new pointer to a buffer, while other versions need to copy into a new buffer? This is nonsense.

C uses null-terminated strings, so it order to extract a substring it must be copied. It also has mutable strings, so standard library functions would need to copy even if the string were properly bounded.

Java using bounded strings, but still doesn't share characters. If you extract a substring, you're getting another copy in memory.

Haskell, using the default ByteString implementation, can do a 'substring' in O(1) time. This alone was probably a large part of the reason Haskell came out ahead - it wasn't computing faster, it was doing less.

Obviously in Java and C you could write logic around byte arrays directly, but this point was for a naive implementation, not a tuned version.

> This would seem to imply that Haskell will "read ahead" from a file. Haskell does not do that

It would seem counter-intuitive that the standard library would read one byte at a time. I would put money on the standard file operations buffering more data than needed - and if they didn't, the OS absolutely would.

> Like laziness, immutability is almost always a performance loss.

On immutability -

In a write-heavy algorithm, absolutely. Even Haskell provides mutable data structures for this very reason.

But in a read-heavy algorithm (Such as my example above) immutability allows us to make assumptions about the data - such as the fact that i'll never change. This means that the standard platform library can, for example, implement substring in O(1) time complexity instead of having to make a defensive copy of the relevant data (Lest something else modify it).

On Laziness -

I'm still relatively fresh to getting my head around laziness, so take this with a grain of salt. But my understanding, from what I've been told and from some personal experience:

In completely CPU bound code, laziness is likely going to be a slowdown. But laziness can be also make it easier to write code in ways that would be difficult in strict languages, which can lead to faster algorithms with the same effort. In this particular example, it was much easier to write this code using streaming non-blocking IO that it would be in C

> It is not especially difficult to write a referentially transparent function in C. Haskell gives you more confidence that you have done it right, but that measures correctness, not performance.

Except that GHC can do some clever optimizations with referential transparency that a C compiler (probably) wouldn't - such as running naively written code over multiple cores.

> When it comes to parallelization, it's all about tuning. Haskell gets you part of the way there, but you need more control to achieve the maximum performance that your hardware is capable of. In that sense, Haskell is something like Matlab: a powerful prototyping tool, but you'll run into its limits.

I completely agree. If you need bare to the metal performance, then carefully crafted C is likely to still be the king of the hill for a very long time. Haskell won't even come close.

But in day to day code, we tend to not micro-optimize everything. We tend to just write the most straight forward code and leave it at that. Haskell, from my experience so far, for the kinds of workloads I'm giving it (IO Bound crud apps, mostly) tends to provide surprisingly performant code under these conditions. I'm under no illusion that it would even come close to C if it came down to finely tuning something however.

>That Haskell provides it is an admission that strict R.T. is unworkable.

Perhaps it is, but that doesn't mean it's not immensely valuable as a default. And it's worth noting that in the case of Debug.Trace, the actual program is still referentially transparent, it's just the debugging tools that break the rules, as they often do.

>Haskell's laziness makes the order of execution highly counter-intuitive.

Yes, there are some use cases where do-notion doesn't capture all the side effects (i.e. time/memory) and so a completely naive imperative perspective breaks down. But these cases are rare, and it's not that hard to learn to deal with them.

A really great rebuttal of his points. I like Haskell, I really do - but I can never get any useful work done out of it. (Note: I am a hobbyist and not a professional programmer)
It's not a great rebuttal, it's just showing why people with imperative mindsets don't really understand Haskell still. The rebuttal rebuttal is good.

Do notation is not specifically a line-by-line imperative thing, and complaining that it isn't that doesn't make it bad. Obviously, the goal in Haskell isn't precisely to do imperative coding. It remains true that you can hack imperative code into Haskell in various ways effectively.

Haskell has a slight, but consistent, edge over Ocaml in most of the "benchmarks game" tests, so it's certainly not true to say that Ocaml beats Haskell in all benchmarks (although, certainly there could be other benchmarks where it does). In either case, Haskell is very performant and is competitive with any other mainstream language in performance.

You can write print statements to do debugging (with Debug.Trace), and in practice it's not very hard to work IO into your code when you need it (even if only for temporary debugging or development). Crucially, however, it's much harder to accidentally work IO into your code. The few cases where I really miss print statements "for free" are vastly outweighed by the many cases in impure languages where I'm accidentally mismanaging my mutable state.

Whether it results in more performant code? In some cases yes (the restrictions make it much easier to prove certain compiler optimizations), but that's not really the point. Referential transparency is about making your code more expressive, and easier to reason about, to design, and to safely tweak.

Haskell performance is very good when written by people who know how the compiler works, and know the bytecode they want generated. I.e., if you rewrite a recursive function in a slightly unintuitive way and apply the right strictness annotations, it will compile down to the same bytecode as a for-loop in C.

Idiomatic Haskell is not generally as fast as mutable C/Java/etc. Creating/evaluating thunks is not fast and immutable data structures often result in excess object creation. When you need them, there is no real substitute for unboxed mutable arrays, something Haskell does NOT make easy.

Haskell is one of my favorite languages, the performance story just isn't quite what I want it to be. I do, however, think that there is plenty of room for improvement, i.e. there is no principled reason Haskell can't compete.

This is exactly where I'm at. My biggest problem is that wrapping non-persistent data structures written in C/C++ never seems comes out right in Haskell. You often have to write them in the IO monad, which is the absolute last thing you want for an otherwise general purpose data structure. I think there may be some solution here using linear types, which enforce that a data type is referenced only once at compile time. This would let you avoid being forced to guarantee persistence when all you care about is speed.

This argument may seem more abstract than what you mention, but in fact it gets to the very heart of why there aren't good unboxed mutable arrays in haskell. In truth, there are. You can convert Immutable Vectors (which are lists with O(1) indexing but no mutation) into Mutable Vectors in constant time using unsafeThaw. The problem is that your code is no longer persistent, and you've risked introducing subtle errors. My biggest problem is that the haskell community seems to look at non-persistent data structures as sacrilegious. As a scientific programmer, that makes me feel like maybe learning haskell wasn't such a good investment after all. But on the bright side, functional programming is on the rise, and I'm confident that all my experience with Haskell will transfer well in the future.

>Haskell performance is very good when written by people who know how the compiler works

I know nothing about how the compiler works, and my haskell code still easily outperforms my clojure code. The only optimizations I do are the same as anywhere else: profile and look at functions taking up too much time.

>and know the bytecode they want generated.

Bytecode is not involved. Machine code is, but I don't even know ASM to know what I want generated or if it is being generated that way.

>When you need them, there is no real substitute for unboxed mutable arrays, something Haskell does NOT make easy.

This is simply nonsense. Unboxed mutable vectors are trivial in haskell: https://hackage.haskell.org/package/vector-0.10.11.0/docs/Da... No, there is no substitute for using the right data types. Why do you think haskell or haskellers suggest using the wrong data types?

My goal is "as fast as C (TM)". Clojure is not known for being a speed demon.

I didn't say you couldn't do arrays with Haskell, I said Haskell doesn't make it easy. Here are the actual array docs, BTW: http://www.haskell.org/haskellwiki/Arrays

Out of curiosity, what's difficult about that mutable array implementation?

I'm a relative Haskell novice, but was able to write some mutable array code with only a cursory read through the documentation.

Granted it's extremely verbose compared to most imperative languages.

>My goal is "as fast as C (TM)".

Enjoy using C then. You suggested that haskell was bad because it was not fast enough. If "not as fast as C" is not fast enough, then virtually every language is not just bad, but much worse than haskell.

>I said Haskell doesn't make it easy

And I showed you that it is in fact trivially easy.

>Here are the actual array docs, BTW

That is a random, user-edited wiki page. I linked to the actual docs.

Depends a lot on the libraries, too. I had to scrape a bunch of HTML recently, which I prefer to use XPath for; the library I used -- HXT, if I remember correctly, it was the horrible one that uses arrows -- made my program perform on par with Ruby, and when I benchmarked it, I found it was allocating about 2GB of data throughout the program, while parsing a document that was probably around 100KB.
I believe HXT uses the default representation of strings as lists of chars, instead of more efficient packed representations. This likely contributes to the excessive memory usage.
Sure. As a decidely unseasoned Haskell user, however, it's hard to sympathize with inefficient libraries for something as established as XML.

There may be other, faster libs that I don't know about, but I couldn't find them. I tried HaXml first (from which HXT is apparently derived), but the parser choked on my document and the author didn't come forward with a fix when I reported the problem (by email, the project isn't on Github). There is one called HXML, but I think it's dead. The TagSoup library might have worked, but I don't think so. It's not easy jumping into a new language and then coming up against library issues that prevent you from finishing your first project.

> performance is very good when written by people who know how the compiler works, and know the bytecode they want generated.

Yes.

The old situation with list processing in which the decision to fold from left or from right can make a big performance difference might be the fundamental example of this kind of problem. It is enough to make me think twice about the wisdom of defining lists recursively. It definitely doesn't feel "declarative", which attribute is surely more important than elegant simplicity of implementation.
>sacrifice the abilitity to write print statements to do debugging

No.

>lose the ability to reason about order of execution

No.

>But does it really result in more performant code?

That is not the goal. The goal is being able to reason about the code, and write code that is correct. The fact that it performs very well is due to a high quality compiler, not purity.

> Every benchmark I've ever seen, more practical languages like Ocaml have come out on top.

Doesn't look that way from here: http://benchmarksgame.alioth.debian.org/u32/ocaml.php How exactly is a language that is unable to handle parallelism "more practical" than one that handles it better than virtually any other language?

OCaml is more practical than ML and Haskell because it has objects, for loops, more edge cases in the language, built in mutable keyword, and extensible records.
No it is not. Ocaml's objects make it less practical, not more. That is why they are virtually completely unused. At best, for loops are irrelevant. I'd say they are closer to a negative than irrelevant though. What do you mean "more edge cases?" That the language is less safe? How is that practical? Haskell has mutable references too, with the added benefit of them being type safe. And haskell has extensible records, they are just a library like anything else: http://hackage.haskell.org/package/vinyl
> That the language is less safe?

Not necessarily.

> And haskell has extensible records, they are just a library like anything else:

And OCaml has monads, they are just a library like else.

>Not necessarily.

Then what? You made the vague statement, make it not vague.

>And OCaml has monads, they are just a library like else.

And? I did not claim ocaml lacks monads. You claimed haskell lacks extensible records. You do understand that my post was a direct reply to what you said right? Not just some random things I felt like saying for no particular reason.

Monads are arguably a library in Haskell, too... though one the standard guarantees is present, exposed by the Prelude, and relied on by a lot of code.
Actually, Haskell does let you write print statements for debugging.

If we have the following function:

    foo :: Int -> Int
    foo x = x `div` 0
and we want to add debugging, we can do:

    import Debug.Trace

    foo :: Int -> Int
    foo x
      | trace (show x) False = undefined
      | otherwise = x `div` 0
The above will print the value of x before throwing an error due to division by zero. You don't have to make foo return an IO Int or change any other aspect of your program.
Something I like to do is:

    import Debug.Trace
    wtf v x = trace (show x) v

    someFunction x = anotherFunction x `wtf` x
This allows me to tack `wtf` onto the end of otherwise unchanged expressions to do print debugging