Not only can you, in my experience it is substantially less drama and arguably less load on the target system since the full page may make many many other requests that a presentation layer would care about that I don't
The trade-offs usually fall into:
- authing to the endpoint can sometimes be weird
- it for sure makes the traffic stand out since it isn't otherwise surrounded by those extraneous requests
- it, as with all good things scraping, carries its own maintenance and monitoring burden
However, similar to those tradeoffs, it's also been my experience that a full page load offers a ton more tracking opportunities that are not present in a direct endpoint fetch. I mean, look how many "stealth" plugins out there designed to mask the fact that a headless browser is headless
But, having said all of that: without question the biggest risk to modern day scraping is Cloudflare and Akamai gatekeeping. I do appreciate the arguments of "but ddos!11" and yet I would rather only actors that are actually exhibiting bad behavior[1] be blocked instead of everyone trying with a copy of python who have set reasonable rate limits
1 = this setting aside that "bad behavior" can be defined as "downloading data that the site makes freely available to Chrome but not freely available to python"
The trade-offs usually fall into:
- authing to the endpoint can sometimes be weird
- it for sure makes the traffic stand out since it isn't otherwise surrounded by those extraneous requests
- it, as with all good things scraping, carries its own maintenance and monitoring burden
However, similar to those tradeoffs, it's also been my experience that a full page load offers a ton more tracking opportunities that are not present in a direct endpoint fetch. I mean, look how many "stealth" plugins out there designed to mask the fact that a headless browser is headless
But, having said all of that: without question the biggest risk to modern day scraping is Cloudflare and Akamai gatekeeping. I do appreciate the arguments of "but ddos!11" and yet I would rather only actors that are actually exhibiting bad behavior[1] be blocked instead of everyone trying with a copy of python who have set reasonable rate limits
1 = this setting aside that "bad behavior" can be defined as "downloading data that the site makes freely available to Chrome but not freely available to python"