Hacker News new | ask | show | jobs
by alpha_squared 1746 days ago
Do you mind giving an example? I'm having trouble following where CSS is limited for selection.
5 comments

XPath does general data processing not just selection

E.g. when you have a list of numbers on the website, XPath can calculate the sum or the maximum of the numbers

Or you have a list of names "Last name, First name", then you can remove the last name and sort the first names alphabetically. Or count how often each name occurs and return the most popular name.

Then it goes back to selection, e.g. select all numbers that are smaller than the average. Or calculate the most popular name, then select all elements containing that name

Like other commentor says: parent/child. But also selecting by content (e.g. "click the button with the delete-icon" or "find the link with '@harrypotter') or selecting by attributes (e.g. click the pager-item that goes to next page) or selecting items outside of body (e.g. og-tags, title etc). All are doable in CSS3 selectors, but everything shouts that they are not meant for this; whereas xpath does this far more natural.
The element(s) before an element: //h3/preceding-sibling::p[1] Match something's parent: //title/.. Match all ancestors: //title[@id = 'abc']/ancestor::comment

Element with src or href attr: //[@src or @href] or multiple conditions: //article[@state = "approved" and not(comments/comment)]

Element with more than two children: //ul[count(li) > 2] Element with matching descendents: //article[//video]

Element text containing substring: //p[contains(text(), "Foo")] Attribute containing substring: //a[ends-with(@href, ".jpg")]

Numerical attribute selection: //product[@price > round(2.5 @discount)] //product[sum(//[starts-with(name(), 'price-')]/@price) > 0]

Attribute values: //a/@href Text values with spaces normalised: //a/normalize-space(text())

Match all attributes or elements or text nodes: //user/@ or //user/node() or //user/text() or //user/comment()

Basically from any node in a document you can select its ancestors, children, descendants, siblings, attributes etc, and filtering has the same power as selecting does - in CSS there's :not() that can apply to selection or filtering, with :has() finally on the way and no :or(). CSS selectors match against HTML elements and they're great for that almost all of the time, but while you can filter by attribute value including substring and even by regular expression, for text there's :empty.

But for a query syntax you need to be able to select attributes and text content as well as elements. Either extend XPath to support #id and .class syntax

//#user-xyz//note/text() //code.language-js/@name

or extend CSS to at allow selecting attrs and text

#user-xyz note :text code.language-js @name

The former is more powerful, the latter a quick hack (if they only appear at the end of the selector anyway) with instant payoff.

Searching text content is my main remaining use of XPath.
Well, the big one is selecting a parent from the child.
You could do this with the :has() CSS psuedo-class[0], though inverted (select a parent that _has_ the child matching a selector).

Looks like that psuedo-class has not been implemented in the kuchiki library that htmlq uses though.

[0]: https://developer.mozilla.org/en-US/docs/Web/CSS/:has

You can do it either way in XPath thanks to how you can use a path expression and/or predicates almost everywhere in a query

  # Find all elements li and select the parent element for each
  //li/.. 

  # Find all element nodes with a child element named li
  //*[li]

  # Non-abbreviated queries
  /descendant::li/parent::*
  /descendant::*[child::li]

  # CSS using :has
  :has(> li)