| I too was using text-only versions of sites like CNN, Reuters, or Christian Science Monitor[1], and they were fine. But what I really wanted was to turn any news website into a text-only website. So I build NewsWaffle, which for any website: https://github.com/acidus99/NewsWaffle * Automatically builds a list of news stores, separate from the navigational hyperlinks. * Detects RSS/Atom feeds to provide a more accurate list of news stories. * Uses Readability to show only article content on article pages. * Uses meta data like OpenGraph or Twitter cards to provide richer formatting, and to determine page type. It regularly converts 900 KB home pages or 1.2 MB news articles into into 3KB for links to news stories and 5K of text It does this by: * Using semantic tags like <header>, <footer>, and <nav> to determines which hyperlinks are navigational and which ones are likely links to news articles. * OpenGraph meta data to determine page type news stories and extra metadata. * A Aggressive HTML parser that strips out a ton of tags, CSS, JS, etc * Readability library to extract out the text of news articles I built this as a service in Gemini, so if you have a gemini browser you can try it. Otherwise, here is a HTTP-to-gemini proxy showing you what a NYT article looks like: Gemini link: gemini://gemi.dev/cgi-bin/waffle.cgi/ NYT Homepage: https://portal.mozz.us/gemini/gemi.dev/cgi-bin/waffle.cgi/li... NYT Article: https://portal.mozz.us/gemini/gemi.dev/cgi-bin/waffle.cgi/ar... [1] https://www.csmonitor.com/text_edition |
I tested aldaily.com and had trouble navigating to get to the articles. Allsides.com worked. Techmeme.com did not work.
gemini://gemi.dev/cgi-bin/waffle.cgi/links?https%3A%2F%2Fallsides.com%2F
https://portal.mozz.us/gemini/gemi.dev/cgi-bin/waffle.cgi/li...