| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lobster_johnson 4771 days ago

That Reddit post is great. For me, in learning Haskell, the biggest challenge has been libraries. Many of them just seem badly designed, or unfinished.

My first project I wrote a simple scraping tool that scrapes a certain web site and organizes the information in a database behind a front end and API. Immediately I encountered several problems that would have been a cakewalk in, say, Ruby:

* The HTTP library (Network.HTTP) is not encoding-aware. It ignores Content-Type and returns a String with some undefined encoding. So if you grab a resource which is, say, ISO-8859-1, anything that works with UTF-8 will potentially blow up.

* (And if you do this yourself, there is no built-in way to parse MIME charsets into Haskell encoding names except by writing it yourself, I think.)

* The string libraries are confusing. There is String, but also Text. Both are supposed to be Unicode-aware, but many of the operations you want are in Text, requiring conversions back and forth.

* I could not, and still haven't, figured out how to convert a ByteString containing ISO-8859-1 to an UTF-8 string. Data.Encoding apparently exists for that, but I could not get it to work like I wanted. I ended up with this lovely mantra (which you wouldn't think had anything to do with ISO-8859-1, but it does) and left it like that just because it works:

    decodeLatin1 bytes = Data.Text.unpack $ Data.Text.pack $
      Data.ByteString.Char8.unpack bytes

* There are multiple libs for HTML/XML parsing, and no clear winner. HaXml was buggy and did not parse my HTML correctly. TagSoup and HandsomeSoup had weird APIs I did not like. I ended up with HXT, which uses Haskell arrow syntax extensively and is therefore incomprehensible, even when I understand what and how it does it. It feels a lot like people who write ambitious libs in Ruby with too much "magic". Haskell is a functional language and should be wonderful for parsing HTML/XML using XPath or CSS selectors, but it's a nightmare. HXT's parser is impure by default (!) and you have to google some tutorials to find out how to use its pure version, which the tutorial writers (in the same breath) dissuade you from using.

* For the frontend I ended up using Happstack Lite because it's small and about as easy to get started with as Sinatra. Unfortunately, the first templating system it recommends, Blaze, is crap. With Blaze you write the template inline, building an XML tree, similar to Ruby's "Builder" gem, but the syntax is ugly and unnecessarily complicated. The other recommended templating systems, HSP and Hamlet, are like ERB or PHP, so a serious step backwards from HAML. In the end, I could not find anything like HAML.

* I haven't delved very deeply into Happstack yet, but like many Haskell libs it seems to make simple things more complicated than they ought to be (eg., the routing system). One annoyance I remember is the fact that HTTP verbs aren't IO, which makes code less intuitive, since obviously in a modern web app, HTTP verbs are going to be doing IO (things like talking to databases), and so you have to use liftIO a lot.

* Perhaps the biggest issue for me was the general lack of high-quality documentation (Happstack itself has almost none). Sure, there are tons of searchable machine-generated docs, which is great, but that documentation lacks the bits that tie everything together. Since Haskell is mainly about applying functions to values of types, it is very decentralized; knowing what function to use is only part of the game. It's just as important to know how its return value plays with the rest of the function space, because so much of Haskell revolves around composing functions in idiomatic ways. This is very different from, say, Ruby, where you know a method returns an object, and an object only has the methods its class provides; this centralizes, clusters and constrains the information into easily digestible pieces.

A veteran haskeller would probably not struggle too much with these things. But as a novice, I expected the path to be slightly easier. The language was rarely a problem, the libraries definitely were.

1 comments

tmhedberg 4771 days ago

Just a few thoughts on the points you made:

> * The HTTP library (Network.HTTP) is not encoding-aware. It ignores Content-Type and returns a String with some undefined encoding. So if you grab a resource which is, say, ISO-8859-1, anything that works with UTF-8 will potentially blow up.

Agreed that the HTTP package is rough around the edges. It also doesn't support HTTPS, which seems like a major shortcoming to me. Fortunately, there are several better-designed alternatives (which have the added benefit of being more efficient), e.g. http-streams and http-conduit.

> * The string libraries are confusing. There is String, but also Text. Both are supposed to be Unicode-aware, but many of the operations you want are in Text, requiring conversions back and forth.

This pain can be mitigated somewhat by using `{-# LANGUAGE OverloadedStrings #-}`, at least when you're working with literal Strings, ByteStrings, and/or Texts. The main problem with the proliferation of string types is when using multiple libraries, each of which expect a different type of string (or one expects a strict ByteString and the other a lazy ByteString). Then the conversions can get irritating.

Generally speaking, you should use Text for textual strings, ByteString for binary data, and String in simple cases when the convenience of using Prelude or list functions trumps performance concerns. They all have reasonably well defined uses.

> * There are multiple libs for HTML/XML parsing, and no clear winner. HaXml was buggy and did not parse my HTML correctly. TagSoup and HandsomeSoup had weird APIs I did not like. I ended up with HXT, which uses Haskell arrow syntax extensively and is therefore incomprehensible, even when I understand what and how it does it. It feels a lot like people who write ambitious libs in Ruby with too much "magic". Haskell is a functional language and should be wonderful for parsing HTML/XML using XPath or CSS selectors, but it's a nightmare. HXT's parser is impure by default (!) and you have to google some tutorials to find out how to use its pure version, which the tutorial writers (in the same breath) dissuade you from using.

Have you tried the `xml` package [1]? It's not as sophisticated as something like HXT, but for a lot of uses, it does the trick and is far easier to work with. HXT is intimidating for sure, but apparently very powerful once you learn it (I haven't bothered).

> * For the frontend I ended up using Happstack Lite because it's small and about as easy to get started with as Sinatra. Unfortunately, the first templating system it recommends, Blaze, is crap. With Blaze you write the template inline, building an XML tree, similar to Ruby's "Builder" gem, but the syntax is ugly and unnecessarily complicated. The other recommended templating systems, HSP and Hamlet, are like ERB or PHP, so a serious step backwards from HAML. In the end, I could not find anything like HAML.

Funny, I personally think blaze-html is terrific. I like having a Haskell EDSL for HTML generation instead of having to drop into a specialized and restricted "templating language" for that purpose.

If you want something like HAML, check out `hamlet` [2]. It's directly inspired by HAML and is the main templating engine of the Yesod web framework.

I found Snap to be somewhat easier to work with than Happstack, after trying them both. It occupies about the same level of abstraction. Yesod is a more full-stack Rails-like framework, which some people prefer. You can pretty easily mix and match components of the different frameworks however you like; they are nice and modular. I often use snap-core, acid-state, digestive-functors, and blaze-html together for simple web apps, even though they are not all part of any one framework.

> * I haven't delved very deeply into Happstack yet, but like many Haskell libs it seems to make simple things more complicated than they ought to be (eg., the routing system). One annoyance I remember is the fact that HTTP verbs aren't IO, which makes code less intuitive, since obviously in a modern web app, HTTP verbs are going to be doing IO (things like talking to databases), and so you have to use liftIO a lot.

Frequently having to use `liftIO` is a sign that the library author overspecialized their functions to IO. If more people would write libraries using `MonadIO m => m a` instead of `IO a` that problem would basically disappear. Polymorphism is wonderful, when you are able to apply it.

> * Perhaps the biggest issue for me was the general lack of high-quality documentation (Happstack itself has almost none). Sure, there are tons of searchable machine-generated docs, which is great, but that documentation lacks the bits that tie everything together. Since Haskell is mainly about applying functions to values of types, it is very decentralized; knowing what function to use is only part of the game. It's just as important to know how its return value plays with the rest of the function space, because so much of Haskell revolves around composing functions in idiomatic ways. This is very different from, say, Ruby, where you know a method returns an object, and an object only has the methods its class provides; this centralizes, clusters and constrains the information into easily digestible pieces.

I don't disagree that more documentation is a good thing, and some libraries are under-documented. But I can also point to examples of Haskell libraries with superior documentation to nearly anything else I've encountered. Some authors really go the extra mile; it's great.

My experience has been that the type system plus Haddock docs are frequently enough for me to figure out a library even without any prose documentation at all. This, to me, is a huge advantage of Haskell. The types only fit together one way, and that way is the correct one. When using a library with well-designed types, you basically can't get it wrong.

I certainly understand why this would not be so easy for a newbie. It gets much, much easier with experience, though.

[1] http://hackage.haskell.org/package/xml [2] http://hackage.haskell.org/package/hamlet

link

lobster_johnson 4771 days ago

I really appreciate the detailed response!

> Fortunately, there are several better-designed alternatives

Do those you mention handle content type and encodings correctly?

> Have you tried the `xml` package [1]?

Yeah, it doesn't do selection based on CSS selectors or XPath expressions. I don't want to have to manually write tree-traversal code when those already exist.

Take a look at my current scraping code [1] if you want to see how I use HXT. (Just promise not to laugh at my novice work.) Basically, the arrow syntax gives you a weird portal into a different world where you select stuff from a tree using a predicate syntax. I say weird because within the proc and the "-<" arrows, it seems you can only deal with XmlTree operations.

> Funny, I personally think blaze-html is terrific.

It's a bit much, but not awful for generating XML programmatically. The main problem is the need to use "toValue" and "toHtml" all the time. I guess I don't understand why the element building functions don't accept strings.

But for templating, I just don't like embedding templates in the actual source code of the HTTP verb it's for. The main controller code should be about preparing data for the UI, and the UI should be separate from the controller code. You could put it in a separate file and function and import it, of course, but it's still Haskell code, which must be compiled along with the entire app, which slows down the development cycle a lot.

I haven't looked closely at the various web frameworks to see if they support Rails-style reloading (ie., recompiling and reloading the app on each page load). Do you know?

> If you want something like HAML, check out `hamlet`

Hamlet is like someone looked at HAML and didn't grasp why it's so good, because they decided they had to make it look like HTML. Here is HAML:

    div#foo
      ul.list
        - items.each do |item|
          li= item.title

And the same in Hamlet:

    <div #foo>
      <ul .list>
        $forall item <- items
          <li>#{itemTitle item}

Why invent something that looks like HTML but isn't? HAML syntax is basically (almost) CSS selectors, that's the whole point.

And again, it's apparently meant to be placed inline.

The Snap guys (I think) also invented their own versions of SASS, called Cassius and Lucius. Cassius corresponds to the indented SASS syntax, and uses CSS selectors, unlike Hamlet.

> Frequently having to use `liftIO` ...

Oh, interesting. I don't know about MonadIO yet. Will definitely read up on this.

> My experience has been that the type system plus Haddock docs …

Haddock is great, but a lot of it is non-trivial to understand without having to read the docs very closely. The "xml" library is an example of a simple library where, if you browse around, you can piece together how it works, although it's so bare-bones that if you just want to know how to parse some XML, you have to hunt down the right function, which happens to be in Text.XML.Light.Input; nowhere is there a "how to" introduction which illustrates use cases.

With more complex libraries, the machine-generated docs become increasingly obtuse. For example, I would never, ever have figured out how to use HXT with Hackage alone. I mean, look at it! [2] Is the parser in Text.XML.HXT.Arrow.ParserInterface? Nope. In Text.XML.HXT.Arrow.ReadDocument? Curiously, no. The function to parse a document, runX, is apparently not even in the Hackage database. And good look finding out that this is the function you need to use, never mind the semantics of XML arrows.

[1] https://github.com/alexstaubo/pfinn/blob/master/Scraping.hs

[2] http://hackage.haskell.org/package/hxt

link

tmhedberg 4771 days ago

> Do those you mention handle content type and encodings correctly?

I honestly can't say for sure, but I've never encountered such a problem in my own usage. Maybe that just means I've been lucky. I just know from firsthand experience that they do a lot of things better than the HTTP package does.

> The main problem is the need to use "toValue" and "toHtml" all the time. I guess I don't understand why the element building functions don't accept strings.

First of all, if you're writing string literals and explicitly converting them with `toHtml`, you're missing out. Instead, enable the OverloadedStrings extension, and then you can do this sort of thing:

    html $ do
      head $ title "My Web Page"
      body $ do
        h1 "A Heading"
        h2 "A Subheading"
        hr
        p "A paragraph of text"
        p $ "Another paragraph, with an " <> (a ! href "http://example.com/") "embedded link"
        p $ em "Emphasized text" <> " and " <> strong "strongly emphasized text"

Notice the total lack of `toHtml` there.

The HTML DSL functions don't accept strings because String and Html are two fundamentally different types of data. Enabling the type system to differentiate between the two lets you do some pretty nifty things. For instance, if you were to write

    "<script>alert('XSS!');</script>" :: String

you'd be in trouble, but you can't do that since blaze-html doesn't accept plain old strings. Instead, you'd write

    "<script>alert('XSS!');</script>" :: Html

Note how only the type is changed, but now, the string will be transparently converted to "<script>alert('XSS!');</script>" before it's added to the document, and thus you're safe, without having to actually do anything different. The type signatures I gave are superfluous of course, since they'd be inferred automatically. The type system makes this kind of flaw impossible.

Plus, since blaze combinators are just Haskell, and the Html type is a monad, you can use all the usual Monad functions with Html:

    ol $ mapM_ (li . toHtml) ['A'..'Z']

That generates a 26-item list of letters from A to Z. Yes, you do have to do the explicit conversion to Html there, but I think that's a small price to pay for all the flexibility afforded by templating HTML with full-fledged Haskell code.

> But for templating, I just don't like embedding templates in the actual source code of the HTTP verb it's for. The main controller code should be about preparing data for the UI, and the UI should be separate from the controller code. You could put it in a separate file and function and import it, of course, but it's still Haskell code, which must be compiled along with the entire app, which slows down the development cycle a lot.

Yeah, I just put "view code" like the above examples into a separate module and import it. I don't find the slowdown to be significant, but maybe that's just me. In any case, true template languages have to be compiled too, either with the rest of the program or at runtime, so one way or another you incur that cost no matter what you're using.

> I haven't looked closely at the various web frameworks to see if they support Rails-style reloading (ie., recompiling and reloading the app on each page load). Do you know?

Yesod does this by default, and Snap can, if you use `snap-loader-dynamic` for your development builds. I believe they both use the awesome `hint` package for runtime eval and compilation of Haskell code. I don't think Happstack has any auto-reloading capability though.

> Why invent something that looks like HTML but isn't? HAML syntax is basically (almost) CSS selectors, that's the whole point. And again, it's apparently meant to be placed inline.

It doesn't seem that different from HAML to me, but it's clearly a subjective judgment. :) And it doesn't have to be inline. It uses the QuasiQuotes language extension, which means you can do inline snippets or write it in a separate file, however you like.

Hamlet, Cassius, Lucius, and Julius (collectively, the "Shakespearean" template languages) are developed by the Yesod framework guys, not Snap, by the way. Snap uses a template language called Heist by default, which I don't particularly care for (I just use blaze-html instead).

> Oh, interesting. I don't know about MonadIO yet. Will definitely read up on this.

If you use liftIO, you're using MonadIO. That's what liftIO does, it's an adapter between plain IO values and polymorphic MonadIO values. But if library authors use MonadIO instead of IO, you can omit the explicit conversion. This isn't something you really have control over as the user of a library though.

> With more complex libraries, the machine-generated docs become increasingly obtuse. For example, I would never, ever have figured out how to use HXT with Hackage alone. I mean, look at it!

Fully agreed. HXT is a huge, complex beast! I don't think it's fair to judge all libraries by that standard, though. HXT is clearly at an extreme end of that continuum.

link

lobster_johnson 4771 days ago

I have been cursing the string handling and now you tell me OverloadedStrings exists. This makes sense. Thanks for the help, that actually makes Blaze a little closer to HAML for me.

I appreciate the input on the other things. I will check out Snap, I think.

link

tel 4771 days ago

> My experience has been that the type system plus Haddock docs are frequently enough for me to figure out a library even without any prose documentation at all. This, to me, is a huge advantage of Haskell. The types only fit together one way, and that way is the correct one. When using a library with well-designed types, you basically can't get it wrong.

This is (a) exactly true and (b) a huge problem. For the expert Haskeller it's usually fairly trivial to figure out the semantics and operations of a library by exploring the types. This means that few experts are incentivized to explain the concepts of their libraries well. This is doubly compounded by the fact that so many Haskell paradigms are uniform across all libraries so it's easy to say "it's got a Alternative-Bifunctor-Semigroup interface" and assume the user will figure out what that means elsewhere.

Now, genuinely new abstractions like `pipes` get very thorough documentation because it pays to teach new abstractions just once. That stuff really ought to be decorating all of the stable, introductory libraries, though. Without it the entirety of Hackage looks terribly uninviting.

link

lobster_johnson 4771 days ago

Btw, I just discovered that Hoogle doesn't index many packages. Apparently "Hayoo" is the one to use:

http://holumbus.fh-wedel.de/hayoo/hayoo.html

At least it finds stuff like HXT's runX. Unfortunately, it seems buggy in that you cannot link directly to a search.

link