Hacker News new | ask | show | jobs
by andai 624 days ago
Yesterday I considered writing a web scraper completely from scratch (just sockets). Without HTTPS, this is trivial. Of course, you lose out on much (most?) of the web, but I have a feeling most small / interesting sites would still be accessible.

I have found that, given a random sampling of web content, an extremely small fraction of it is interesting or useful to me (nor indeed is hardly any of it what I would consider high quality enough to use as the basis for the future governors of mankind!)

1 comments

Even if you moved the entire TLS web to non-TLS, this is no longer trivial. The web requires Javascript to render, full stop. Fetching and parsing HTML alone is totally insufficient.
> The web requires Javascript to render, full stop.

Then how the fuck am I reading this let alone replying?

> The web requires Javascript to render, full stop.

A small correction: some parts of the new web require JavaScript to render.

That's why on many websites teh experience is better without JS. To be more specific, several paywalled websites can be accessed just by turning the JS off. You could even say the opposite is true in these cases: JS is being used to prevent text rendering.

A while back I disabled JS in my browser. I think I even disabled image loading. This resulted in a vastly improved experience. You'd think mere adblock would get you most of the way there, but the difference is staggering.
That's been my way of browsing for a while now and I agree it works for the most part. I have no intention of going back.

It's especially nice to have JavaScript disabled by default, so I can enable one script at a time until it becomes readable. But not so many scripts that it becomes unreadable again.