|
|
|
|
|
by irjustin
2058 days ago
|
|
Anyone who does scraping or automated browser work eventually comes across XPath. In some ways, XPath is like regex. It's got insane power, but comes with a relatively steep learning curve. Remember reading regex for the first time? What? But unlike regex, the number of people using it are few in comparison. I avoided XPath until I couldn't anymore. I could do a lot with CSS selectors, but eventually the DOM traversal became difficult to reason about w/ just CSS. After taking the dive, it's so powerful. Read a single XPath and like regex, you can fully understand what the thing is going after and how it will get there. There are functions in XPath 2.0 that I would love to have, but Nokogiri for Rails is stuck in 1.0 world with no plan to go to 2.0. Sad, but I'll live. |
|
IMO the learning curve of XPath is not that high though, it has a somewhat alien syntax but the only thing I remember giving me trouble is axis, because most tutorials just go on with the "shortcut" syntax so the first time you encounter axis everything goes pear-shaped.
> There are functions in XPath 2.0 that I would love to have, but Nokogiri for Rails is stuck in 1.0 world with no plan to go to 2.0. Sad, but I'll live.
Nokogiri should support function extensions[0] and most of the XPath 2.0 functions were originally extensions to 1.0[1], so even if these functions are not distributed with nokogiri you should be able to add them yourself.
Incidentally, Nokogiri seems to optionally depend on libexslt, which is the exslt implementation in C for libxml2/libxslt, so exslt should be available either as an option or by building it yourself.
[0] https://github.com/sparklemotion/nokogiri/commit/eb56525fbcc...
[1] http://exslt.org