| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by threecheese 484 days ago

Interesting; one thing you may have learned researching existing tools and libraries: many of them serialize markdown to html before running structured extraction/manipulation - even stuff like converting to pdf.

The core assumption here is that Markdown was/is designed to be serializeable to html - this is why a markdown document/AST is mostly not a tree structure, for tree-ish elements such as sub-sections. Instead, it is flat, an array of elements in order of appearance in the document. Apparently this most closely matches the structure of html, at both the block and inline levels. Only Lists and Blockquotes (afair) support nesting.

Ex: h1 -> paragraph -> h2 -> paragraph is not nested, it is an array of four ordered elements.

Anyway, you might throw a task at Cursor or Copilot to see how an equivalent implementation using html fares against your test suite, you may be able to develop more quickly.