|
|
|
|
|
by franciscop
1376 days ago
|
|
Thanks for the feedback! So far I plan on making this a stepping stone for a fully integrated HN reader, where you can read the whole thing in-page, and for those pages that cannot be parsed (paywalls etc) to just redirect to the original. I prefer not to circumvent any barriers nor hide the user agent for that, and in my situation instead just redirect to the original. I should also find a better html-to-markdown parser, thanks for the recommendation there! From the "example", yes you guessed "readability" perfectly. And for downloading the page just fetch() + jsdom. Suggestions: - [JS]: I use fetch+jsdom, so no JS parsed at all! I've found most content-heavy websites (a.k.a. articles, blog posts, etc) are server-side-rendered, haven't searched too many but so far no issue without JS. Might move to puppeteer at some point for either failed parses with jsdom or for a domain whitelist if I keep one at some point. - [header]: Already mentioned - [Front matter]: Right now I'm actually returning two custom headers, `title` and `url`, might add more in the future. I did consider front-matter, but I want to keep the body as "raw" as possible. - Edit: what I'm considering next is an endpoint to download articles with basic HTML style, or as pdf/epub. |
|