| Just a few thoughts on the points you made: > * The HTTP library (Network.HTTP) is not encoding-aware. It ignores Content-Type and returns a String with some undefined encoding. So if you grab a resource which is, say, ISO-8859-1, anything that works with UTF-8 will potentially blow up. Agreed that the HTTP package is rough around the edges. It also doesn't support HTTPS, which seems like a major shortcoming to me. Fortunately, there are several better-designed alternatives (which have the added benefit of being more efficient), e.g. http-streams and http-conduit. > * The string libraries are confusing. There is String, but also Text. Both are supposed to be Unicode-aware, but many of the operations you want are in Text, requiring conversions back and forth. This pain can be mitigated somewhat by using `{-# LANGUAGE OverloadedStrings #-}`, at least when you're working with literal Strings, ByteStrings, and/or Texts. The main problem with the proliferation of string types is when using multiple libraries, each of which expect a different type of string (or one expects a strict ByteString and the other a lazy ByteString). Then the conversions can get irritating. Generally speaking, you should use Text for textual strings, ByteString for binary data, and String in simple cases when the convenience of using Prelude or list functions trumps performance concerns. They all have reasonably well defined uses. > * There are multiple libs for HTML/XML parsing, and no clear winner. HaXml was buggy and did not parse my HTML correctly. TagSoup and HandsomeSoup had weird APIs I did not like. I ended up with HXT, which uses Haskell arrow syntax extensively and is therefore incomprehensible, even when I understand what and how it does it. It feels a lot like people who write ambitious libs in Ruby with too much "magic". Haskell is a functional language and should be wonderful for parsing HTML/XML using XPath or CSS selectors, but it's a nightmare. HXT's parser is impure by default (!) and you have to google some tutorials to find out how to use its pure version, which the tutorial writers (in the same breath) dissuade you from using. Have you tried the `xml` package [1]? It's not as sophisticated as something like HXT, but for a lot of uses, it does the trick and is far easier to work with. HXT is intimidating for sure, but apparently very powerful once you learn it (I haven't bothered). > * For the frontend I ended up using Happstack Lite because it's small and about as easy to get started with as Sinatra. Unfortunately, the first templating system it recommends, Blaze, is crap. With Blaze you write the template inline, building an XML tree, similar to Ruby's "Builder" gem, but the syntax is ugly and unnecessarily complicated. The other recommended templating systems, HSP and Hamlet, are like ERB or PHP, so a serious step backwards from HAML. In the end, I could not find anything like HAML. Funny, I personally think blaze-html is terrific. I like having a Haskell EDSL for HTML generation instead of having to drop into a specialized and restricted "templating language" for that purpose. If you want something like HAML, check out `hamlet` [2]. It's directly inspired by HAML and is the main templating engine of the Yesod web framework. I found Snap to be somewhat easier to work with than Happstack, after trying them both. It occupies about the same level of abstraction. Yesod is a more full-stack Rails-like framework, which some people prefer. You can pretty easily mix and match components of the different frameworks however you like; they are nice and modular. I often use snap-core, acid-state, digestive-functors, and blaze-html together for simple web apps, even though they are not all part of any one framework. > * I haven't delved very deeply into Happstack yet, but like many Haskell libs it seems to make simple things more complicated than they ought to be (eg., the routing system). One annoyance I remember is the fact that HTTP verbs aren't IO, which makes code less intuitive, since obviously in a modern web app, HTTP verbs are going to be doing IO (things like talking to databases), and so you have to use liftIO a lot. Frequently having to use `liftIO` is a sign that the library author overspecialized their functions to IO. If more people would write libraries using `MonadIO m => m a` instead of `IO a` that problem would basically disappear. Polymorphism is wonderful, when you are able to apply it. > * Perhaps the biggest issue for me was the general lack of high-quality documentation (Happstack itself has almost none). Sure, there are tons of searchable machine-generated docs, which is great, but that documentation lacks the bits that tie everything together. Since Haskell is mainly about applying functions to values of types, it is very decentralized; knowing what function to use is only part of the game. It's just as important to know how its return value plays with the rest of the function space, because so much of Haskell revolves around composing functions in idiomatic ways. This is very different from, say, Ruby, where you know a method returns an object, and an object only has the methods its class provides; this centralizes, clusters and constrains the information into easily digestible pieces. I don't disagree that more documentation is a good thing, and some libraries are under-documented. But I can also point to examples of Haskell libraries with superior documentation to nearly anything else I've encountered. Some authors really go the extra mile; it's great. My experience has been that the type system plus Haddock docs are frequently enough for me to figure out a library even without any prose documentation at all. This, to me, is a huge advantage of Haskell. The types only fit together one way, and that way is the correct one. When using a library with well-designed types, you basically can't get it wrong. I certainly understand why this would not be so easy for a newbie. It gets much, much easier with experience, though. [1] http://hackage.haskell.org/package/xml
[2] http://hackage.haskell.org/package/hamlet |
> Fortunately, there are several better-designed alternatives
Do those you mention handle content type and encodings correctly?
> Have you tried the `xml` package [1]?
Yeah, it doesn't do selection based on CSS selectors or XPath expressions. I don't want to have to manually write tree-traversal code when those already exist.
Take a look at my current scraping code [1] if you want to see how I use HXT. (Just promise not to laugh at my novice work.) Basically, the arrow syntax gives you a weird portal into a different world where you select stuff from a tree using a predicate syntax. I say weird because within the proc and the "-<" arrows, it seems you can only deal with XmlTree operations.
> Funny, I personally think blaze-html is terrific.
It's a bit much, but not awful for generating XML programmatically. The main problem is the need to use "toValue" and "toHtml" all the time. I guess I don't understand why the element building functions don't accept strings.
But for templating, I just don't like embedding templates in the actual source code of the HTTP verb it's for. The main controller code should be about preparing data for the UI, and the UI should be separate from the controller code. You could put it in a separate file and function and import it, of course, but it's still Haskell code, which must be compiled along with the entire app, which slows down the development cycle a lot.
I haven't looked closely at the various web frameworks to see if they support Rails-style reloading (ie., recompiling and reloading the app on each page load). Do you know?
> If you want something like HAML, check out `hamlet`
Hamlet is like someone looked at HAML and didn't grasp why it's so good, because they decided they had to make it look like HTML. Here is HAML:
And the same in Hamlet: Why invent something that looks like HTML but isn't? HAML syntax is basically (almost) CSS selectors, that's the whole point.And again, it's apparently meant to be placed inline.
The Snap guys (I think) also invented their own versions of SASS, called Cassius and Lucius. Cassius corresponds to the indented SASS syntax, and uses CSS selectors, unlike Hamlet.
> Frequently having to use `liftIO` ...
Oh, interesting. I don't know about MonadIO yet. Will definitely read up on this.
> My experience has been that the type system plus Haddock docs …
Haddock is great, but a lot of it is non-trivial to understand without having to read the docs very closely. The "xml" library is an example of a simple library where, if you browse around, you can piece together how it works, although it's so bare-bones that if you just want to know how to parse some XML, you have to hunt down the right function, which happens to be in Text.XML.Light.Input; nowhere is there a "how to" introduction which illustrates use cases.
With more complex libraries, the machine-generated docs become increasingly obtuse. For example, I would never, ever have figured out how to use HXT with Hackage alone. I mean, look at it! [2] Is the parser in Text.XML.HXT.Arrow.ParserInterface? Nope. In Text.XML.HXT.Arrow.ReadDocument? Curiously, no. The function to parse a document, runX, is apparently not even in the Hackage database. And good look finding out that this is the function you need to use, never mind the semantics of XML arrows.
[1] https://github.com/alexstaubo/pfinn/blob/master/Scraping.hs
[2] http://hackage.haskell.org/package/hxt