|
|
|
|
|
by 10000truths
504 days ago
|
|
The issue is that DOM structure does not correspond one-to-one with perceived structure. I could render things in the DOM that aren't visible to people (e.g. a transparent 5px x 5px button), or render things to people that aren't visible in the DOM (e.g. Facebook's DOM obfuscation shenanigans to evade ad-blocking, or rendering custom text to a WebGL canvas). Sure, most websites don't go that far, but most websites also aren't valuable targets for automated crawling/scraping. These kinds of disparities will be exploited to detect and block automated agents if browser automation becomes sufficiently popular, and then we're back to needing to render the whole browser and operate on the rendered image to keep ahead of the arms race. |
|
That's a problem of misaligned economic incentives. If there is a blockchain which enables micro-transactions of 0.000001 cent per request, and in the order of a million tps or a billion tps, then servers have no reason not to accept money in exchange for information, instead of using ads to extract some eyeball attention.
There is no reason that i cannot invoke a command line program: `$fetch_social_media_posts -n 1000` and get the last thousand posts right there in the console, as long as i provide some valid transactions to the server.
Websites and ads are the wrong solution to the problem of gaining something while serving information, and headless browsers and scraping are the wrong solution to the first wrong solution and the problems it creates.