Hacker News new | ask | show | jobs
by pg 5325 days ago
I believe Rtm has already set one up.
2 comments

The conspicuous lack of a "Server:" header inclines me to believe that that's probably not the case (most web servers set one indicating the server software and version). Here are the headers that HN sends out from an old post (20 days ago):

  HTTP/1.1 200 OK
  Content-Type: text/html; charset=utf-8
  Cache-Control: private
  Connection: close
  Cache-Control: max-age=0
My favorite part of HN's headers: the lines are separated by naked LFs instead of CRLF, in violation of the HTTP spec
This is common violation that everyone accepts. It's definitely done by 'bad' clients - not sure how often servers send bare LF.

(I used to telnet to port 80 for testing, and type GET / HTTP/1.0 <enter> <enter>, and that should be LF on Linux & Mac)

You don't have a problem with one of the most trafficked sites for programming/web startup-related news implementing HTTP incorrectly?

Do you ignore whether your HTML is valid just because the browser rendered it correctly?

Yup.

I've got real work to do. Making a validator happy is fake work.

By ensuring that your pages are valid, you make it ever so much more likely that you will not have to scramble around wasting time at a most inopportune time when the new version of a browser comes out which handles your non-standards compliant tag soup differently than the current version of the browser.

So, do you want to pay the price upfront when you can plan for it or afterwards when the fix must be done immediately because customers are complaining?

Some of us actually care about interoperability, maintainability and writing good code in general as opposed to just cowboying stuff together as quickly as possible
Why don't you bother doing your real work right the first time? As long as there's a well defined spec, you might as well follow it instead of being creative and original when it comes to implementing standards.
You don't know that everyone accepts it. Even if they did, it doesn't make it right.
I fixed submitted a patch for this in the pecl_http PHP library:

https://bugs.php.net/bug.php?id=58442

We use varnish for caching and check the useragent for requests.

If the cache has a copy of an article that is a few hours old it will just give that version to Googlebot while if it thinks a human is requesting the page then it will go to the backend and fetch the latest version.

https://www.varnish-cache.org/lists/pipermail/varnish-misc/2...

+1 for varnish. It's stupidly[1] fast and there shouldn't be much trickery required to deflect most of HN's traffic (e.g. ~10 sec expiry for "live" pages, infinite expiry for archived pages).

[1] 15k reqs/sec on a moderate box