|
|
|
|
|
by lobster_johnson
4274 days ago
|
|
The project wasn't that recent, so I don't quite remember, but I would have wanted something like dom-selector, and that one didn't come up in my searches for solutions. It's interesting that XML libs have to invent operators and obnoxious syntax (like HXT's arrow usage, or coincidentally the fact that HXT's parser uses the IO type, which is just crazy talk). dom-selector seems to have the same problem. I prefer readable functions, not DSLs where my code suddenly descends into this magic bizarro-world of operator soup for a moment. Lenses would make tree-based extraction easier, I think, although lenses aren't easy to understand or that easy to read. Tree traversal with lenses and zippers seems unnecessarily complicated to me. In a scraper you just want to collect items recursively, and return empty/Nothing values for anything that fails a match: Collect every item that contains a <div class="h-sku productinfo">, map its h2 to a title and its <div class="price"> to a price, and then combine those two fields into a record. It's something that should result in eminently readable code, not just because it's a conceptually trivial task, but also because someday you need to go back to the code and remember how it works. |
|
Bizarro world of operator soup? I don't really follow you. That dom selector code just compiles down into functions itself. I don't see how anything could be any clearer than a css selector for selecting an html element.