|
|
|
|
|
by codygman
4275 days ago
|
|
Why don't you think TagSoup would have worked? I've used it for quite a few use cases. edit: Then to make things dead simple, add on dom-selector: http://hackage.haskell.org/package/dom-selector It enables using css selectors like so: queryT [jq| h2 span.titletext |] root
|
|
It's interesting that XML libs have to invent operators and obnoxious syntax (like HXT's arrow usage, or coincidentally the fact that HXT's parser uses the IO type, which is just crazy talk). dom-selector seems to have the same problem. I prefer readable functions, not DSLs where my code suddenly descends into this magic bizarro-world of operator soup for a moment.
Lenses would make tree-based extraction easier, I think, although lenses aren't easy to understand or that easy to read. Tree traversal with lenses and zippers seems unnecessarily complicated to me.
In a scraper you just want to collect items recursively, and return empty/Nothing values for anything that fails a match: Collect every item that contains a <div class="h-sku productinfo">, map its h2 to a title and its <div class="price"> to a price, and then combine those two fields into a record. It's something that should result in eminently readable code, not just because it's a conceptually trivial task, but also because someday you need to go back to the code and remember how it works.