Hacker News new | ask | show | jobs
by banthar 3995 days ago
Lets try an example. Extract first link address from https://news.ycombinator.com/.

As DOM query:

    document.getElementsByClassName("title")[0].parentElement.getElementsByTagName("a")[1].href
This will break:

* When title element no longer has "title" class.

* When title is no longer a sibling of link.

* When link is no longer 2nd link of its parent.

As regular expression:

    document.documentElement.innerHTML.match('td class="title">.*a href="([^"]*)"')[1]
This will break:

* On any white space change.

* On any new attributes on td or a.

* When ' is used instead of "

* When href includes escaped "

* In most cases when DOM query will break.

Many of those can happen without any server-side changes. It will sometimes works sometimes won't - making it hard to test.

There are cases when regular expression will break less often than DOM but DOM is easier to reason about, more predictable and has less corner cases.