Hacker News new | ask | show | jobs
by chrismorgan 2002 days ago
I’m curious about the HTTP/0.9 stuff. Last time I checked (four or five years ago, by making a trivial HTTP/0.9 server, as the first step of a “writing HTTP servers” tutorial series that never materialised), Firefox either wouldn’t render the output at all, or would render it as though it had been a text/plain response (can’t remember which it was). In other words, it’s not a real web page. I would expect precisely zero occurrences among the top million pages, not the thirty that 0.003% indicates. I think it’s far more likely (p≅0.9999) that these indicate some sort of error.

(For those unfamiliar with it, HTTP/0.9 is the label for a protocol where the client opens a TCP connection to the server and sends “GET /”, and the server responds with the body, verbatim, and closes the connection. No status codes, no headers, nothing.)

1 comments

With any large-scale scan dataset like this, noise is inevitable. Legitimate use of HTTP/0.9 in consumer-facing web servers is exceptionally unlikely, but there are all sorts of scenarios which could have led HTTP/0.9 responses to bleed into the data.

For instance, here is an untested hypothesis: ~30 of the hostnames on the list have abandoned DNS A records, pointing to EC2 servers. Those EC2 IPs have since been repurposed as honeypots of some kind. The honeypots present themselves as HTTP/0.9, in order to look more like low-grade IoT devices.

That hypothesis is almost certainly wrong, but you could quickly invent another and at some point one of them will be correct. The internet is just a very messy place.