| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by andai 624 days ago
	Yesterday I considered writing a web scraper completely from scratch (just sockets). Without HTTPS, this is trivial. Of course, you lose out on much (most?) of the web, but I have a feeling most small / interesting sites would still be accessible. I have found that, given a random sampling of web content, an extremely small fraction of it is interesting or useful to me (nor indeed is hardly any of it what I would consider high quality enough to use as the basis for the future governors of mankind!)

1 comments

sneak 624 days ago

Even if you moved the entire TLS web to non-TLS, this is no longer trivial. The web requires Javascript to render, full stop. Fetching and parsing HTML alone is totally insufficient.

link

Am4TIfIsER0ppos 623 days ago

> The web requires Javascript to render, full stop.

Then how the fuck am I reading this let alone replying?

link

benterix 624 days ago

> The web requires Javascript to render, full stop.

A small correction: some parts of the new web require JavaScript to render.

That's why on many websites teh experience is better without JS. To be more specific, several paywalled websites can be accessed just by turning the JS off. You could even say the opposite is true in these cases: JS is being used to prevent text rendering.

link

andai 623 days ago

A while back I disabled JS in my browser. I think I even disabled image loading. This resulted in a vastly improved experience. You'd think mere adblock would get you most of the way there, but the difference is staggering.

link

fwsgonzo 623 days ago

That's been my way of browsing for a while now and I agree it works for the most part. I have no intention of going back.

It's especially nice to have JavaScript disabled by default, so I can enable one script at a time until it becomes readable. But not so many scripts that it becomes unreadable again.

link