| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by x3blah 2265 days ago

"It's text, just send the damn text."

They only send what the user requests.

Using a software program that makes automatic requests that you are not easily in control of, e.g., a popular web browser, might give the impression that they control what is sent.

They do not control what is sent. The user does.^1

The user makes a request and they send a response.

One of the requests a fully-automatic web browser makes to NYT is to static01.nyt.com

Personally, as a user who prefers text-only, this is the only request I need to make. As such I don't really need a heavily marketed, fully-automatic, graphical, ad-blocking web browser to make a single request for some text.^2

    #! /bin/sh

    case $1 in
    world        |w*)  x=world       # shortcut: w
    ;;us         |u*)  x=us          # shortcut: u
    ;;politics   |p*)  x=politics    # shortcut: p
    ;;nyregion   |n*)  x=nyregion    # shortcut: n
    ;;business   |bu*) x=business    # shortcut: bu
    ;;opinion    |o*)  x=opinion     # shortcut: o
    ;;technology |te*) x=technology  # shortcut: te
    ;;science    |sc*) x=science     # shortcut: sc
    ;;health     |h*)  x=health      # shortcut: h
    ;;sports     |sp*) x=sports      # shortcut: sp
    ;;arts       |a*)  x=arts        # shortcut: a
    ;;books      |bo*) x=books       # shortcut: bo
    ;;style      |st*) x=style       # shortcut: st
    ;;food       |f*)  x=food        # shortcut: f
    ;;travel     |tr*) x=travel      # shortcut: tr
    ;;magazine   |m*)  x=magazine    # shortcut: m
    ;;t-magazine |t-*) x=t-magazine  # shortcut: t-
    ;;realestate |r*)  x=realestate  # shortcut: r
    ;;*)
    echo usage: $0 section
    exec sed -n '/x=/!d;s/.*x=//;/sed/!p' $0
    esac

    curl -s https://static01.nyt.com/services/json/sectionfronts/$x/index.jsonp

   Example: Make simple page of titles, article urls and captions, where above script is named "nyt".

    nyt tr |  sed '/\"headline\": \"/{s//<p>/;s/\".*/<\/p>/;p};/\"full\": \"/{s//<p>/;s/..$/<\/p>/;p};/\"link\": \"/{s///;s/ *//;s/\".*//;s|.*|<a href=&>&</a>|;p}' > travel.html

    firefox ./travel.html

Source: https://news.ycombinator.com/item?id=22125882

The truth is that they are just sending the damn text. However you are voluntarily choosing to use a software program that is automatically making requests for things other than the text of the article, i.e., "cruft".

1. The Google-sponsored HTTP/[23] protocol is seeking to change this dynamic, so if websites sending stuff to you without you requesting it first bothers you, you might want to think about how online advertisers and the companies that enable them might use these new protocols.

2. However I might use one for for viewing images, watching video, reading PDFs, etc., offline. Web browsers are useful programs for consuming media. It is in the simple task of making HTTP requests that their utility has diminished over time. The user is not really in control.

4 comments

gfxgirl 2264 days ago

I'm just as upset by bloat and tracking as well but the criticism seem a little off for some reason I can't quite put my finger on.

I go to a restaurant and I can't just walk into the kitchen and grab a plate of food. Nor can I walk into the refrigerator, grab some supplied, and then walk over to the stations and start cooking. Instead I have wait to be seated, order indirectly via a waiter, wait for the chef and staff to prepare more order, etc...

It seems to me visiting a website is similar. The user choose to visit the site. That includes the 3rd parties and less controls. Just like I don't get to pick what sources the restaurant used for their food, nor do I have any say in their hiring or management practices. Nor do I have any choice in the music they play or the TVs they have on (bar like restaurants often have TVs). If I don't like their choices my choice is to be or not be a customer. I don't get to hack around that, walking in the back door and taking the food.

I know the analogy isn't perfect. It's my computer and I have no obligation to let them use it as they please vs as I please. But still, there's some middle ground IMO between the 2 extremes.

link

dahauns 2264 days ago

>the criticism seem a little off for some reason I can't quite put my finger on.

IMO the reason is quite easy to put the finger on:

It's because it is framing the problem squarely as one of the user, culminating in the phrase that one is "voluntarily choosing".

If you don't want to do research and customize scripts for every friggin' domain/website (and having to do it again when the site structure changes), there is no "voluntary choice".

If you don't want to accept that this "solution" has to forgo a lot of essential characteristics of hypermedia, there is no "voluntary choice".

If you're not technically versed in these things, there never was a "voluntary choice" to begin with.

In general, if you want to use the World Wide Web remotely as it is intended, there is no "voluntary choice".

link

TeMPOraL 2264 days ago

> I'm just as upset by bloat and tracking as well but the criticism seem a little off for some reason I can't quite put my finger on.

I think you're unclear in your mind about the relationship between you and the website you visit.

To use your restaurant analogy, browsing the web is more like ordering delivery. You send a request for food from the menu and money to cover it, and a while later, a driver with a bag arrives at your doorstep. That bag contains the food you order, some packaging, often plastic cutlery, and some advertising. The transaction between you and the restaurant involved exchanging money for food, and the restaurant doesn't get to have any further say about what you do with that food. You're free to throw away the box, the cultery and the advertising leaflets into the bin, and give half of the food to your cat. They cannot, technically or ethically, make you eat the food out of the box it came from, while reading the advertising leaflets.

It's like that with web browsers. You ask for content (via HTTP), you get a response that includes links to other things you're invited to request. You're free to cut the response up and render it the way you like, you're free to request or not request the other linked resources. That was how the web was designed to work, that's how HTTP protocol is meant to be used. Now plenty of websites will try to insist they're more like dining in than delivery, but that's just them trying to guilt-trip you into making them more money. It's not something they're entitled to.

link

advertiser 2264 days ago

Chrome, Firefox, Safari, Edge all include the ability to block certain requests.

https://developers.google.com/web/tools/chrome-devtools/netw...

https://developer.mozilla.org/en-US/docs/Tools/Network_Monit...

https://developer.apple.com/documentation/safariservices/cre...

https://docs.microsoft.com/en-us/microsoft-edge/devtools-gui...

The web's first browser, working in line mode, could probably request the text, and only the text, of an article from nytimes.com. It has no capability to automatically follow links to ads and trackers.

https://www.w3.org/INSTALL.html

link

x3blah 2264 days ago

The long line of sed is out-of-date and thus "broken". For something simpler that works, try this:

   nyt tr |sed 's/ *//;/</!d'|uniq > travel.html

This will produce a simple web page of titles and URLs for each article page.

An interesting point of discussion might be the amount of third party cruft on the template article page versus the more dynamic front page. When Javascript is disabled, on each article page all images display and there are no ads. Downloading any video in the page is as simple as

   curl -O `grep -o https://[^\"]*mp4 article.html`

link

taneq 2264 days ago

You're right, I guess, if you consider "the web" to mean "HTML over HTTP". In real terms, though, a modern web site is the HTML plus all of the images and other text that goes with it, and it's designed as a package. The fact that the web browser connects to the web server to download all of the bloat doesn't change the fact that the bloat was specified by the HTML served by the web site. It's just an implementation detail.

link

avn2109 2265 days ago

Pro comment here which should be way higher up the page. Good comment content, good Unixbeard vibe, great use of sed.

link