| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wheels 5325 days ago

Have you considered putting a caching reverse proxy in front of the arc app to keep the backend from having to render all of the old pages?

It seems like the only dynamic element of old articles is the "$x days ago" bit and that'd be pretty easy to turn into something static by instead just putting in timestamps in the actual HTML and using Javascript to transform them into how many hours / days ago they were. Then the crawlers would just be pulling out cached, pre-rendered HTML.

There's an example of doing such with nginx here:

http://serverfault.com/questions/30705/how-to-set-up-nginx-a...

With that you'd just have to send out the HTTP header from the arc app saying that current articles expire immediately, and old ones don't.

1 comments

pg 5325 days ago

I believe Rtm has already set one up.

link

wheels 5325 days ago

The conspicuous lack of a "Server:" header inclines me to believe that that's probably not the case (most web servers set one indicating the server software and version). Here are the headers that HN sends out from an old post (20 days ago):

  HTTP/1.1 200 OK
  Content-Type: text/html; charset=utf-8
  Cache-Control: private
  Connection: close
  Cache-Control: max-age=0

link

bascule 5325 days ago

My favorite part of HN's headers: the lines are separated by naked LFs instead of CRLF, in violation of the HTTP spec

link

divtxt 5325 days ago

This is common violation that everyone accepts. It's definitely done by 'bad' clients - not sure how often servers send bare LF.

(I used to telnet to port 80 for testing, and type GET / HTTP/1.0 <enter> <enter>, and that should be LF on Linux & Mac)

link

bascule 5325 days ago

You don't have a problem with one of the most trafficked sites for programming/web startup-related news implementing HTTP incorrectly?

Do you ignore whether your HTML is valid just because the browser rendered it correctly?

link

snprbob86 5325 days ago

Yup.

I've got real work to do. Making a validator happy is fake work.

link

marshray 5325 days ago

You don't know that everyone accepts it. Even if they did, it doesn't make it right.

link

pufuwozu 5325 days ago

I fixed submitted a patch for this in the pecl_http PHP library:

https://bugs.php.net/bug.php?id=58442

link

slyall 5325 days ago

We use varnish for caching and check the useragent for requests.

If the cache has a copy of an article that is a few hours old it will just give that version to Googlebot while if it thinks a human is requesting the page then it will go to the backend and fetch the latest version.

https://www.varnish-cache.org/lists/pipermail/varnish-misc/2...

link

moe 5325 days ago

+1 for varnish. It's stupidly[1] fast and there shouldn't be much trickery required to deflect most of HN's traffic (e.g. ~10 sec expiry for "live" pages, infinite expiry for archived pages).

[1] 15k reqs/sec on a moderate box

link