Hacker News new | ask | show | jobs
by JohannesKauf 731 days ago
Cool to see another library in this space!

I see that you took the test cases from Turndown. However Turndown isn’t actually that accurate. This is especially noticeable when converting entires websites.

The best comparison would be against Pandoc. That is (in my opinion) the best html to markdown converter right now.

Although it is extremely difficult to handle every edge case. As an example, this usually causes problems:

  <p>nitty<em>-gritty-</em>details</p>

Note: Six years ago I open sourced a Golang library [1]. Currently I am re-writing it completely with the aim of getting even better than Pandoc. And wrote about the encountered edge-cases [2].

[1] https://github.com/JohannesKaufmann/html-to-markdown

[2] https://html-to-markdown.com/edge-cases

1 comments

Thanks for the information! This is really helpful, glad to know these resources for improving it.