Hacker News new | ask | show | jobs
by windlep 4068 days ago
I really like Haskell, but one of the main problems I've had (that I don't see many people cite) is that the libraries just aren't made for use under serious load/concurrency. Many of the people that have written these libraries, and use them are not using them in high-performance, memory-sensitive areas (production use at companies).

There are Haskell libs of course that are used in these environments, and the companies usually end up fixing them such that they're quite good. Most libs used by pandoc are likely to be great, and there's a few dozen others of the same caliber (its useful to search around and see what libs are used by the other few companies using Haskell since they have likely been vetted as well).

The other largest issue to actually using Haskell is that all the knowledge your ops team has of running a production system are essentially null and void. All your existing knowledge of how to fix performance issues, null and void. Learning Haskell and becoming productive in it almost starts to look like the easy part compared to effectively running a Haskell (dealing with space leaks, memory fragmentation issues, and ghc tuning for stack sizes, allocations, etc).

3 comments

I've actually found the exact opposite. The library ecosystem is rich and mature. Haskell is, by default, "concurrency safe" because of referential transparency. You can safely "async" and compose almost any library in the Haskell ecosystem without worrying about shared memory etc underneath.

Also, a lot of the really common libraries like text, attoparsec (parsers), aeson, networking, etc are highly tuned for low latency and performance. Many use compiler rewrite rules and techniques called stream-fusion to compact a lot of the machine code away. Also aggressive inlining etc can be done.

I'm sure there are some memory-heavy or poorly optimized libraries out there but that's certainly not the norm. I've had no problems with the libraries off-the-rack.

I actually thought that too, but I guess that's not the case. I helped write some HTTP2 frame-parsers for Haskell using attoparsec, but apparently it wasn't fast enough as the lib author later rewrote all the attoparsec code to use pointers to the underlying byte buffers.

https://github.com/kazu-yamamoto/http2/commit/0a3b03a22df1ca...

The stream fusion stuff is sweet, but not exactly unique to Haskell since any language with good iterator/generator abstractions have similar constant-time memory characteristics.

I believe you're misunderstanding what stream fusion is. A language compiler does not really need "good iterator/generator abstractions" more than a guarantee of side-effect free transformations in order to be able to de-forest the intermediate data structures. http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.7...
I meant more that languages with an iterator/generator usually have similar constant space usage. That is a drastic oversimplification of stream fusion and fails to mention other practical outcomes as you mention, along with a variety of optimizations.

I found this posting a little more approachable to seeing the various optimizations possible with stream fusion: https://donsbot.wordpress.com/2008/06/04/haskell-as-fast-as-...

Haskell has a problem in that people think about the basic libraries as deprecated, but won't deprecate them due to backward compatibility. Thus, people starting on it will get plenty of slow and unsafe constructs, while people used to it only look at the fast and safe ones.

There should be warnings all over the Prelude and basic libraries documentation.

Yup, agreed. The #haskell channel on FreeNode has been great about providing feedback on what libraries one should use for performance.
Could you name some of these libraries? IME, most of the libraries that are needed for common things are very mature.
I tried to write a Haskell websocket server, the library is quite nice, but it leaked memory (space leak? fragmentation? some of both?): https://github.com/jaspervdj/websockets/issues/72

The author helped me narrow it down to some issues with how ghc by default allocates a stack space that is rarely enough, and once it starts growing the stack space the RAM per connection gets pretty ridiculous. Using higher default stack space helped remedy this some, but the per-connection RAM cost was still way higher than Golang/Python which I was comparing to.

So... separate project, I write a load-tester in haskell for a websocket server. I need to issue some HTTP requests, and I see Brian O'Sullivan made a nice library, wreq. I use it as described and quickly discover it uses ridiculous amounts of memory because it doesn't mention that you should always re-use the Session (the underlying http-client emphasized the importance of re-using the Manager): https://github.com/bos/wreq/issues/17

(I am sorry that this issue prolly came off as a bit whiny there, I was very frustrated that such a gap was omitted from the docs)

So, my program is working pretty nicely, until I discover that its not actually sending multiple HTTP requests at once (even though the underlying http-client lib has a thread-safe TCP connection pool). After browsing some code, I see the problem: https://github.com/bos/wreq/issues/57

The solution that was so far implemented seems equally weird to me.... letting different requests stomp over the Session's cookie jar... I forked it so that I could have multiple wreq Sessions use the same Manager, and now it finally works as it should.

I won't even go into how some of these libs have occasionally wanted conflicting dependencies which leads into its own 'cabal hell' (googling for that is entertaining unless its happening to you).

I've only been writing Haskell for a bit over a year now, but everytime I write code with it, despite my love of the language, the libraries and run-time end up frustrating me.

Great comments and better bug reports, just want to mention it actually says to always re-use Session in the wreq tutorial[0]. Perhaps it should be stated more prominently, repeated, or even both.

    For non-trivial applications, we’ll always want to use a Session to efficiently and correctly handle multiple requests.

    The Session API provides two important features:

    When we issue multiple HTTP requests to the same server, a Session will reuse TCP and TLS connections for us. (The simpler API we’ve discussed so far does not do this.) This greatly improves efficiency.
0: http://www.serpentine.com/wreq/tutorial.html#session
Yep, that was added after my bug report about it. Also fairly recently, he added the ability to do a request with no CookieJar at all.
Thank you! This is a really good comment.

Also, your bug reports are really solid.