Hacker News new | ask | show | jobs
by 10165 3314 days ago
I know there was just a discussion yesterday on how amp is awful but it still is useful, e.g., to read WSJ articles.

   curl -o 1.htm https://www.wsj.com/amp/articles/the-quants-run-wall-street-now-1495389108
   sed -n '/./{/<title/,/<\/title/p;/<p>/,/<\/p>/p;}' 1.htm > 2.htm
FWIW, 2.htm has no amp elements, no Javascript, no images, no ads, no externally sourced resources and therefore no tracking.

Add links to non-essential images (cf. auto-loaded by browser). With available captions.

  sed -n '
  /./{/div class=.image/,/<\/div/!d;s/ *//;}
  /src=/{s///;s/\"//g;s/.*/<a Href=&>&<\/a><br>/;}
  /alt=/{s///;s/[\">]//g;/./s/.*/<P>above: &<\/p>/;}
  /Href=/p;/<P>/p' 1.htm >> 2.htm
5 comments

Thanks very much for this script, it was really refreshing to read such a minimal webpage.

I added some bare-minimum CSS to make it a little nicer to read. Full command (with in-place sed):

    curl -o article.htm https://www.wsj.com/amp/articles/the-quants-run-wall-street-now-1495389108
    sed -n '/./{/<title/,/<\/title/p;/<p>/,/<\/p>/p;}' -i article.htm
    echo "<style>html { text-align: center; padding: 36px; } body { max-width: 600px; text-align: left; margin: auto; }</style>" >> article.htm
This is excellent. Most times I'm willing to live with just the text.

The article had some pictures and graphs (see archive.li someone else posted). They were nice but they weren't essential.

Can you explain to me what I am looking at here? (curl -o 1.htm... >2.htm)? And how I can use it to view an AMP page?
It's two separate lines.

The first line uses curl to download the AMP file to 1.htm

The second line use sed to replace some elements in the HTML and writes it out to 2.htm

It saves the page after the filtering that is done in the command and you can open the 2.htm to view the page.
How is this different from using 'links' or 'w3m'?
Using links is both better and easier.

amp html pages look great in links.

This made me wonder if it would render well in emacs using eww. Surprisingly well rendered, actually. Odd that it doesn't have any pictures. But easy to read.
This is excellent, particularly for reading with lynx :)