|
|
|
|
|
by simonw
1759 days ago
|
|
Disagree. I see scraping as about obtaining data in bulk that hasn't been deliberately packaged up for you to use as-is. Most APIs are not designed to give you all of the data at once - they exist to serve other purposes, usually involving returning a small subset of the data to power a user-facing feature. If someone asks me "where did you get those Olympic medal results?" and I say "I scraped them" I think that's accurate vocabulary whether I parsed HTML or gathered them from hundreds of undocumented API calls. If I had downloaded a neat CSV file from the Olympics website with all of the data I needed in one go I wouldn't feel comfortable calling it scraping. Re-reading your comment, I think what I'm describing here does actually fit with your "how difficult it is to extract data from a format that was patently not intended to efficiently spread raw data to other machines" definition - except I'm including APIs that return only a subset of the data as part of those inefficiencies in obtaining the raw data. |
|
Scraping is the act of extricating data from the layout and markup metadata meant to make it pretty for humans.
APIs generally don't include any of that, your HTML-in-a-JSON-object example notwithstanding.
I'd have no objection to calling it scraping when you strip those <P> tags, but aggregating the results of several API queries is bog-standard textbook API usage, which we use the term scraping to differentiate from.