| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 1vuio0pswjnm7 2151 days ago

Approaching the web as an end user, I have also found this to be true. Most websites rarely change their document structure in such a way that breaks simple text-editing scripts. Keyword: "Most". In most cases no specialised tools or libararies are needed for extracting text or other resources. Again, keyword: "most". Personally, just because there may be a few exceptions does not mean I am going to change a strategy that works almost 100% of the time.

Understanding "web APIs", which did not exist when I first starting using the www in 1993, other than as a way to try to control and/or monetise scraping continues to escape me. I do like the increased usage of "endpoints" though, serving only data with no markup. Although XML and JSON are too bloated compared to something sensible like netstrings.

Similarly, on the client side, I fail to understand all the parsing tools and libraries and related promotion; it is just as easy to break any solution that depends on them and in many cases they are obviously overkill, more brittle than simple scripts using generalised text-editing tools.

One example is "jq". In many cases it is clearly overkill and is slower than sed.

https://stackoverflow.com/q/59806699

As a data source, the web is messy. "Standards" cannot be relied on 100%. Some people try to pretend the web is clean and can be tamed, or they "give up" because it is not "perfect" and things can break. Getting hands dirty works the best and most things do not break if kept simple, IME