Hacker News new | ask | show | jobs
by simonw 1753 days ago
I think we can at least agree that there is no formal definition of "scraping".

That said, I had a look around and the definitions I could find tended to support my interpretation:

https://en.wikipedia.org/wiki/Web_scraping - "Newer forms of web scraping involve monitoring data feeds from web servers. For example, JSON is commonly used as a transport storage mechanism between the client and the web server."

https://towardsdatascience.com/web-scraping-basics-82f8b5acd... - "There are 2 different approaches for web scraping depending on how does website structure their contents." (HTML scraping and API access)

https://realpython.com/beautiful-soup-web-scraper-python/ - "Web scraping is the process of gathering information from the Internet. Even copying and pasting the lyrics of your favorite song is a form of web scraping! However, the words “web scraping” usually refer to a process that involves automation"

The more formal dictionaries (Merriam Webster and suchlike) don't seem to have formed an opinion on this one yet!

1 comments

I think the definitions you're cherry-picking are examples of the erosion of the term's specificity, We need a word to describe scraping data values from a body of human-oriented markup. These folks are pushing the limits of ambiguity to rob us of that, and our reward is yet another word that just means using any API.

What is the upside of using this word in such an oddly vague, expansive way? What happens to those of us who need to convey the original specific meaning we coined it for in the first place?

"We need a word to describe scraping data values from a body of human-oriented markup"

I call that parsing, which is one of the steps in scraping that may or not be necessary depending on the data source.

I need a term that means "using automation to gather data from the web, when that data has not been published in a way that is suitable for my purposes". Scraping works great for that!