|
|
|
|
|
by addingnumbers
1753 days ago
|
|
APIs are rate-limited all the time. Compensating for being throttled or paginated isn't scraping. Scraping is the act of extricating data from the layout and markup metadata meant to make it pretty for humans. APIs generally don't include any of that, your HTML-in-a-JSON-object example notwithstanding. I'd have no objection to calling it scraping when you strip those <P> tags, but aggregating the results of several API queries is bog-standard textbook API usage, which we use the term scraping to differentiate from. |
|
That said, I had a look around and the definitions I could find tended to support my interpretation:
https://en.wikipedia.org/wiki/Web_scraping - "Newer forms of web scraping involve monitoring data feeds from web servers. For example, JSON is commonly used as a transport storage mechanism between the client and the web server."
https://towardsdatascience.com/web-scraping-basics-82f8b5acd... - "There are 2 different approaches for web scraping depending on how does website structure their contents." (HTML scraping and API access)
https://realpython.com/beautiful-soup-web-scraper-python/ - "Web scraping is the process of gathering information from the Internet. Even copying and pasting the lyrics of your favorite song is a form of web scraping! However, the words “web scraping” usually refer to a process that involves automation"
The more formal dictionaries (Merriam Webster and suchlike) don't seem to have formed an opinion on this one yet!