| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by billyhoffman 1183 days ago

I too was using text-only versions of sites like CNN, Reuters, or Christian Science Monitor[1], and they were fine. But what I really wanted was to turn any news website into a text-only website.

So I build NewsWaffle, which for any website:

https://github.com/acidus99/NewsWaffle

* Automatically builds a list of news stores, separate from the navigational hyperlinks.

* Detects RSS/Atom feeds to provide a more accurate list of news stories.

* Uses Readability to show only article content on article pages.

* Uses meta data like OpenGraph or Twitter cards to provide richer formatting, and to determine page type.

It regularly converts 900 KB home pages or 1.2 MB news articles into into 3KB for links to news stories and 5K of text

It does this by:

* Using semantic tags like <header>, <footer>, and <nav> to determines which hyperlinks are navigational and which ones are likely links to news articles.

* OpenGraph meta data to determine page type news stories and extra metadata.

* A Aggressive HTML parser that strips out a ton of tags, CSS, JS, etc

* Readability library to extract out the text of news articles

I built this as a service in Gemini, so if you have a gemini browser you can try it. Otherwise, here is a HTTP-to-gemini proxy showing you what a NYT article looks like:

Gemini link: gemini://gemi.dev/cgi-bin/waffle.cgi/

NYT Homepage: https://portal.mozz.us/gemini/gemi.dev/cgi-bin/waffle.cgi/li...

NYT Article: https://portal.mozz.us/gemini/gemi.dev/cgi-bin/waffle.cgi/ar...

[1] https://www.csmonitor.com/text_edition

6 comments

basch 1183 days ago

Pretty amazing.

I tested aldaily.com and had trouble navigating to get to the articles. Allsides.com worked. Techmeme.com did not work.

gemini://gemi.dev/cgi-bin/waffle.cgi/links?https%3A%2F%2Fallsides.com%2F

https://portal.mozz.us/gemini/gemi.dev/cgi-bin/waffle.cgi/li...

link

billyhoffman 1183 days ago

Thanks for letting me know. aldaily works great in raw mode:

gemini://gemi.dev/cgi-bin/waffle.cgi/raw?https%3A%2F%2Fwww.aldaily.com%2F

Clicking on the "more" links which take you to the news articles also works properly as well.

(you can get to raw mode by clicking "Force article view" and then "raw mode." I should probably expose that in other places)

NewsWaffle tries to determine the type of page. Articles get displayed with content run through readability, and then the HTML is stripped down. If its a "links" page, like the home or section page on a news site, it using HTML elements to try and find links to news stories vs navigational links to other parts of the site. Part of that is looking for links with longer text, since link text to news stories tend to be a few words. This helps sort "About Us" from "New Fusion Experiment a Success"). I'll check into why aldaily isn't working properly

Sorry I can't seem to reproduce the Techmeme issue. It works for me:

gemini://gemi.dev/cgi-bin/waffle.cgi/view?https%3A%2F%2Fwww.techmeme.com

link

basch 1183 days ago

Do the techmeme links click through?

link

sgtnasty 1183 days ago

This is fantastic, now I can view news in Gemini all day. Thank you, we need more gemini sites or tools to convert HTML to it.

link

reaperducer 1183 days ago

What are you using for a Gemini client? Lynx handles Gopher URLs, so I presumed it would be OK with Gemini, but no luck.

Any suggestions?

link

billyhoffman 1183 days ago

For the terminal, I use amfora: https://github.com/makew0rld/amfora

For a GUI, I use Lagrange: https://github.com/skyjake/lagrange

Lagrange is sort of the Netscape of Gemini. It works on all the major desktop and mobile OSes. Personally prefer Elaho (iOS) or Buran (Android) for mobile

link

JasonFruit 1183 days ago

Absolutely great! It makes https://antiwar.com work better than the actual website.

link

wolverine876 1183 days ago

Great!

A request: In the linked NY Times front page, more formatting for the article list, maybe blank lines between articles. Visually, it's a challenge.

link

muyuu 1183 days ago

I didn't know I needed this so much.

link

Wistar 1183 days ago

This is excellent! Wow.

link