|
|
|
|
|
by addingnumbers
1753 days ago
|
|
Scraping refers specifically to extracting data from a format designed to be read by humans instead of machines. The gross inefficiency and low data-to-layout ratio are the key things being expressed through connotations of the word "scrape". To scrape is to extract a small amount of something from a much larger substrate. To call every query a scrape is to diminish the specificity and utility of the term. |
|
I would argue that I am, even though it started out as a JSON wrapper.
"To call every query a scrape is to diminish the specificity and utility of the term."
Absolutely disagree with you there. I interpret the term "scraping" as "writing code that gathers data from a source that has not deliberately published that data in a usable format". Gathering data from any kind of API fits that criteria for me, since most APIs only give you a subset of the data at a time.
I think the reason I care so much about this is that I coined the term "git scraping" to cover a variant of scraping that uses Git repositories to store the data and track changes over time - and git scraping applies equally to data sourced from APIs as it does to data sourced from HTML pages. https://simonwillison.net/2020/Oct/9/git-scraping/