The conspicuous lack of a "Server:" header inclines me to believe that that's probably not the case (most web servers set one indicating the server software and version). Here are the headers that HN sends out from an old post (20 days ago):
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Cache-Control: private
Connection: close
Cache-Control: max-age=0
By ensuring that your pages are valid, you make it ever so much more likely that you will not have to scramble around wasting time at a most inopportune time when the new version of a browser comes out which handles your non-standards compliant tag soup differently than the current version of the browser.
So, do you want to pay the price upfront when you can plan for it or afterwards when the fix must be done immediately because customers are complaining?
Some of us actually care about interoperability, maintainability and writing good code in general as opposed to just cowboying stuff together as quickly as possible
Why don't you bother doing your real work right the first time? As long as there's a well defined spec, you might as well follow it instead of being creative and original when it comes to implementing standards.
We use varnish for caching and check the useragent for requests.
If the cache has a copy of an article that is a few hours old it will just give that version to Googlebot while if it thinks a human is requesting the page then it will go to the backend and fetch the latest version.
+1 for varnish. It's stupidly[1] fast and there shouldn't be much trickery required to deflect most of HN's traffic (e.g. ~10 sec expiry for "live" pages, infinite expiry for archived pages).