Hacker News new | ask | show | jobs
by simonw 729 days ago
Yeah, this. Markdown uses less tokens than HTML and most LLMs have been trend on large amounts of Markdown.

That's why tools like this exist: https://jina.ai/reader/

Demo: https://r.jina.ai/https://news.ycombinator.com/item?id=40695...

1 comments

Additionally, when you have strict input token limits: it’s way easier to chunk Markdown while keeping track of context than it is to chunk HTML at all.