Hacker News new | ask | show | jobs
by yummyfajitas 4274 days ago
Haskell performance is very good when written by people who know how the compiler works, and know the bytecode they want generated. I.e., if you rewrite a recursive function in a slightly unintuitive way and apply the right strictness annotations, it will compile down to the same bytecode as a for-loop in C.

Idiomatic Haskell is not generally as fast as mutable C/Java/etc. Creating/evaluating thunks is not fast and immutable data structures often result in excess object creation. When you need them, there is no real substitute for unboxed mutable arrays, something Haskell does NOT make easy.

Haskell is one of my favorite languages, the performance story just isn't quite what I want it to be. I do, however, think that there is plenty of room for improvement, i.e. there is no principled reason Haskell can't compete.

5 comments

This is exactly where I'm at. My biggest problem is that wrapping non-persistent data structures written in C/C++ never seems comes out right in Haskell. You often have to write them in the IO monad, which is the absolute last thing you want for an otherwise general purpose data structure. I think there may be some solution here using linear types, which enforce that a data type is referenced only once at compile time. This would let you avoid being forced to guarantee persistence when all you care about is speed.

This argument may seem more abstract than what you mention, but in fact it gets to the very heart of why there aren't good unboxed mutable arrays in haskell. In truth, there are. You can convert Immutable Vectors (which are lists with O(1) indexing but no mutation) into Mutable Vectors in constant time using unsafeThaw. The problem is that your code is no longer persistent, and you've risked introducing subtle errors. My biggest problem is that the haskell community seems to look at non-persistent data structures as sacrilegious. As a scientific programmer, that makes me feel like maybe learning haskell wasn't such a good investment after all. But on the bright side, functional programming is on the rise, and I'm confident that all my experience with Haskell will transfer well in the future.

>Haskell performance is very good when written by people who know how the compiler works

I know nothing about how the compiler works, and my haskell code still easily outperforms my clojure code. The only optimizations I do are the same as anywhere else: profile and look at functions taking up too much time.

>and know the bytecode they want generated.

Bytecode is not involved. Machine code is, but I don't even know ASM to know what I want generated or if it is being generated that way.

>When you need them, there is no real substitute for unboxed mutable arrays, something Haskell does NOT make easy.

This is simply nonsense. Unboxed mutable vectors are trivial in haskell: https://hackage.haskell.org/package/vector-0.10.11.0/docs/Da... No, there is no substitute for using the right data types. Why do you think haskell or haskellers suggest using the wrong data types?

My goal is "as fast as C (TM)". Clojure is not known for being a speed demon.

I didn't say you couldn't do arrays with Haskell, I said Haskell doesn't make it easy. Here are the actual array docs, BTW: http://www.haskell.org/haskellwiki/Arrays

Out of curiosity, what's difficult about that mutable array implementation?

I'm a relative Haskell novice, but was able to write some mutable array code with only a cursory read through the documentation.

Granted it's extremely verbose compared to most imperative languages.

>My goal is "as fast as C (TM)".

Enjoy using C then. You suggested that haskell was bad because it was not fast enough. If "not as fast as C" is not fast enough, then virtually every language is not just bad, but much worse than haskell.

>I said Haskell doesn't make it easy

And I showed you that it is in fact trivially easy.

>Here are the actual array docs, BTW

That is a random, user-edited wiki page. I linked to the actual docs.

If "not as fast as C" is not fast enough, then virtually every language is not just bad, but much worse than haskell.

I agree. The only languages I've used that are remotely competitive for my purposes are static JVM languages (Java and Scala), Ocaml, and Julia for array ops. Haskell comes closer than many others, but just isn't there yet.

The docs you linked to are a 3'rd party package marked "experimental". I'll also suggest that you are glossing over most of the difficulties in using them. It's trivially easy to call `unsafeRead`. It's not so easy to wrap your operations in the appropriate monad, apply all the necessary strictness annotations to avoid thunks, and properly weave this monad with all the others you've got floating around.

(That last bit is fairly important if you plan to write methods like `objectiveGradient dataPoint workArray`.)

>(Java and Scala), Ocaml,

Except scala and ocaml are both slower than haskell.

>The docs you linked to are a 3'rd party package marked "experimental".

No it is not. What is the point of just outright lying?

>I'll also suggest that you are glossing over most of the difficulties in using them

I'll suggest that if you want people to believe your claim, then you should back it up. Show me the difficulty. Because my week 1 students have no trouble with it at all.

>It's not so easy to wrap your operations in the appropriate monad

You are literally saying "it is not easy to write code". That is like saying "printf" is hard in C because you have to write code. It makes absolutely no sense. Have you actually ever tried learning haskell much less using it?

>apply all the necessary strictness annotations to avoid thunks

All one of them? Which goes in the exact same place it always does? And which is not necessary at all?

>and properly weave this monad with all the others you've got floating around.

Ah, trolled me softly. Well done.

I don't know why you are responding so angrily. The page you linked to explicitly says "Stability experimental" in the top right corner.

I also don't know why you are behaving as if I dislike Haskell. I enjoy Haskell a lot, I just find getting very good performance to be difficult. You can browse my comment history to see a generally favorable opinion towards Haskell if you don't believe me.

I also gave you a concrete example of a reasonable and necessary task I found difficult: specifically, numerical functions which need to mutate existing arrays rather than allocating new ones, e.g. gradient descent. Every time I've attempted to implement such things in Haskell, it takes me quite a bit of work to get the same performance that Scala/Java/Julia/C gives me out of the box (or Python after using Numba).

Depends a lot on the libraries, too. I had to scrape a bunch of HTML recently, which I prefer to use XPath for; the library I used -- HXT, if I remember correctly, it was the horrible one that uses arrows -- made my program perform on par with Ruby, and when I benchmarked it, I found it was allocating about 2GB of data throughout the program, while parsing a document that was probably around 100KB.
I believe HXT uses the default representation of strings as lists of chars, instead of more efficient packed representations. This likely contributes to the excessive memory usage.
Sure. As a decidely unseasoned Haskell user, however, it's hard to sympathize with inefficient libraries for something as established as XML.

There may be other, faster libs that I don't know about, but I couldn't find them. I tried HaXml first (from which HXT is apparently derived), but the parser choked on my document and the author didn't come forward with a fix when I reported the problem (by email, the project isn't on Github). There is one called HXML, but I think it's dead. The TagSoup library might have worked, but I don't think so. It's not easy jumping into a new language and then coming up against library issues that prevent you from finishing your first project.

The "String problem" is definitely one of the most unfortunate parts of Haskell. Using a linked list of chars for a string is just laughable from a performance and resources standpoint. The good news is that the problem should be solved now: we have Data.Text for unicode strings, and Data.ByteString for binary/ASCII/UTF-8 strings. Both are very efficient and implement a robust API for common string operations. The bad news is that there are still far too many libraries that use the old crummy data type for strings, including much of the Prelude. And, I guess in the interest of simplicity, many beginner tutorials tend to use String as well. This is quite unfortunate, but it does seem to be changing: Aeson uses Data.Text, the ClassyPrelude ditches String almost entirely (keeping it for Show only), and in general most modern libraries avoid String.

Hopefully HXT will be updated to use modern string types soon. In the meantime, I believe that xml-conduit (http://hackage.haskell.org/package/xml-conduit) might be what is desired.

Why don't you think TagSoup would have worked? I've used it for quite a few use cases.

edit: Then to make things dead simple, add on dom-selector:

http://hackage.haskell.org/package/dom-selector

It enables using css selectors like so:

    queryT [jq| h2 span.titletext |] root
The project wasn't that recent, so I don't quite remember, but I would have wanted something like dom-selector, and that one didn't come up in my searches for solutions.

It's interesting that XML libs have to invent operators and obnoxious syntax (like HXT's arrow usage, or coincidentally the fact that HXT's parser uses the IO type, which is just crazy talk). dom-selector seems to have the same problem. I prefer readable functions, not DSLs where my code suddenly descends into this magic bizarro-world of operator soup for a moment.

Lenses would make tree-based extraction easier, I think, although lenses aren't easy to understand or that easy to read. Tree traversal with lenses and zippers seems unnecessarily complicated to me.

In a scraper you just want to collect items recursively, and return empty/Nothing values for anything that fails a match: Collect every item that contains a <div class="h-sku productinfo">, map its h2 to a title and its <div class="price"> to a price, and then combine those two fields into a record. It's something that should result in eminently readable code, not just because it's a conceptually trivial task, but also because someday you need to go back to the code and remember how it works.

> I prefer readable functions, not DSLs where my code suddenly descends into this magic bizarro-world of operator soup for a moment.

Bizarro world of operator soup? I don't really follow you. That dom selector code just compiles down into functions itself. I don't see how anything could be any clearer than a css selector for selecting an html element.

> performance is very good when written by people who know how the compiler works, and know the bytecode they want generated.

Yes.

The old situation with list processing in which the decision to fold from left or from right can make a big performance difference might be the fundamental example of this kind of problem. It is enough to make me think twice about the wisdom of defining lists recursively. It definitely doesn't feel "declarative", which attribute is surely more important than elegant simplicity of implementation.