Hacker News new | ask | show | jobs
by kgeist 589 days ago
I remember there was a paper which found that LLMs understand HTML pretty well, you don't need additional preprocessing. The downside is that HTML produces more tokens than Markdown.
1 comments

Right: the token savings can be enormous here.

Use https://tools.simonwillison.net/jina-reader to fetch the https://news.ycombinator.com/ homepage as Markdown and paste it into https://tools.simonwillison.net/claude-token-counter - 1550 tokens.

Same thing as HTML: 13367 tokens.