Hacker News new | ask | show | jobs
by throwup238 593 days ago
This is probably out of scope for your tool but it’d be nice to have built in n-gram deduplication where the tool strips any identical content from the header and footer, like navigation, when pointed at a few of these markdown files.
1 comments

My final university project was about a clean-up-approach on the HTML nodes before sending it to the html-to-markdown converter. But that was extremely difficult and dependent on some heuristics that had to be tweaked.

Your idea of comparing multiple pages would be a great approach. It would be amazing if you build something like this! This would enable so many more use cases... For example a better “send to kindle” (see other comment from rty32 [1]).

[1] https://news.ycombinator.com/item?id=42093964