Hacker News new | ask | show | jobs
by masklinn 2061 days ago
> I was doing web scraping, and needed regular expressions to get the text, so I have implemented XPath 2.

Most XPath implementations have no issue with adding extension functions (in fact many support exslt[0] out of the box), you really do not need to use (let alone implement) XPath 2.0 to use regex functions.

[0] http://exslt.org/regexp/index.html

2 comments

I don't think this especially changes the underlying point: anyone using tools which were based on libxml2 or xerces is basically stuck in 1999. Having to find and install custom extensions adds a regular frictional cost which encourages you to just do more work in a full programming language since you know you'll be able to satisfy any requirement that way.

I saw so many developers sour on XML after hitting the “This would be easy if we used XPath 2 but instead it's hard” wall that I wonder if anyone on the relevant standards committees ever thought about how much libxml2 would make their work relevant.

I did not plan to implement it all, only the parts I needed for the webpages in my city. At first I did not even have backward axes. But people care much more about XPath than they care about my city

I also was doing too much competitive programming back then, where you have to discover and implement a highly complex algorithm in a few hours

If such a complex implementation takes a few hours, I could not imagine implementing anything else taking much longer (especially when the spec already says what needs to be implemented and it does not need to be discovered). A few days at most...

But now I am still working on it 14 years later