Hacker News new | ask | show | jobs
by gwerbin 61 days ago
> Plus, isn't the appeal of LLMs broadly that they can do somewhat-useful things with mostly-arbitrary input (if you ignore the risk of prompt injection)?

They can definitely read HTML, but they do better with more structure. I proposed in a sibling comment for example that the "reader mode" feature in browsers might be a great LLM-compatibility feature to reduce all the HTML token noise. Or exposing an HTTP API with an OpenAPI schema and a proper sitemap and an RSS feed. For example fetching from an RSS feed can be exposed to the LLM as a "tool" that it can call.

1 comments

I don't think it's fair to say that HTML's less structured than Markdown. Markdown is derived from a simplified subset of HTML, and having myself cut my teeth on HTML5 when it was still new, there's been a huge emphasis on the idea of the semantic web conveyed through HTML.