I'd like to state my support for the author's choice of CSS selectors in this particular use case. I think it's a natural fit for this domain and already very well known, perhaps even known better than XPath.
I'd like to add my support here too, but with a note.
When scraping and parsing (or writing integration test DSL), I always start out with CSS selectors. But always hit cases where they lack or require hoop-jumping and then fall back on Xpath. I then have a codebase with both CSS-Sel and Xpath, which is arguably worse then having only one method.
I suspect here, one uses this tool untill CSS selector limitations are getting in the way, after which one switches to another tool(chain)
XPath does general data processing not just selection
E.g. when you have a list of numbers on the website, XPath can calculate the sum or the maximum of the numbers
Or you have a list of names "Last name, First name", then you can remove the last name and sort the first names alphabetically. Or count how often each name occurs and return the most popular name.
Then it goes back to selection, e.g. select all numbers that are smaller than the average. Or calculate the most popular name, then select all elements containing that name
Like other commentor says: parent/child. But also selecting by content (e.g. "click the button with the delete-icon" or "find the link with '@harrypotter') or selecting by attributes (e.g. click the pager-item that goes to next page) or selecting items outside of body (e.g. og-tags, title etc). All are doable in CSS3 selectors, but everything shouts that they are not meant for this; whereas xpath does this far more natural.
The element(s) before an element: //h3/preceding-sibling::p[1]
Match something's parent: //title/..
Match all ancestors: //title[@id = 'abc']/ancestor::comment
Element with src or href attr: //[@src or @href]
or multiple conditions: //article[@state = "approved" and not(comments/comment)]
Element with more than two children: //ul[count(li) > 2]
Element with matching descendents: //article[//video]
Element text containing substring: //p[contains(text(), "Foo")]
Attribute containing substring: //a[ends-with(@href, ".jpg")]
Attribute values: //a/@href
Text values with spaces normalised: //a/normalize-space(text())
Match all attributes or elements or text nodes: //user/@ or //user/node() or //user/text() or //user/comment()
Basically from any node in a document you can select its ancestors, children, descendants, siblings, attributes etc, and filtering has the same power as selecting does - in CSS there's :not() that can apply to selection or filtering, with :has() finally on the way and no :or(). CSS selectors match against HTML elements and they're great for that almost all of the time, but while you can filter by attribute value including substring and even by regular expression, for text there's :empty.
But for a query syntax you need to be able to select attributes and text content as well as elements. Either extend XPath to support #id and .class syntax
//#user-xyz//note/text()
//code.language-js/@name
or extend CSS to at allow selecting attrs and text
#user-xyz note :text
code.language-js @name
The former is more powerful, the latter a quick hack (if they only appear at the end of the selector anyway) with instant payoff.
You can do it either way in XPath thanks to how you can use a path expression and/or predicates almost everywhere in a query
# Find all elements li and select the parent element for each
//li/..
# Find all element nodes with a child element named li
//*[li]
# Non-abbreviated queries
/descendant::li/parent::*
/descendant::*[child::li]
# CSS using :has
:has(> li)
Playwright ppl had to solve this for themselves, you can mix them as they are distinct, have few small custom modifications to help with selectors. Playwright compatible selectors would be nice.
When scraping and parsing (or writing integration test DSL), I always start out with CSS selectors. But always hit cases where they lack or require hoop-jumping and then fall back on Xpath. I then have a codebase with both CSS-Sel and Xpath, which is arguably worse then having only one method.
I suspect here, one uses this tool untill CSS selector limitations are getting in the way, after which one switches to another tool(chain)