| You're absolutely right that existing standards like sitemap.xml + Accept headers could work in theory, but here's why we built this for eCommerce specifically: The HTML-to-Markdown Problem
Even with Accept: text/markdown, most eCommerce sites will return HTML (then converted server-side). This means: Scripts/popups in <div> hell ("Subscribe to newsletter!" embedded in product specs) Ad fragments ("Customers also bought...") polluting context windows Layout cruft (header/footer markup in every response) llms.txt files are handcrafted Markdown – no noise, just atomic product data. Control Over Exposure
Retailers want to: Expose only approved fields (e.g., hide "Compare at $X" prices) Sanitize dynamically (e.g., remove out-of-stock variants) Avoid scrapers misusing their HTML endpoints /site-llms.xml lets them curate what LLMs see, separate from human-facing HTML. Performance at Scale
For catalogs with 100K+ products: Generating Markdown per-request via Accept headers is expensive Pre-rendered llms.txt files can be CDN-cached Sitemap indexes (>50K URLs) are already battle-tested We’re not replacing sitemap.xml – we’re extending it for a specific use case where clean, pre-processed data matters more than flexibility. |
> Accept: text/markdown, most eCommerce sites will return HTML
Which means the site doesn't have markdown. So, add it? There are plenty of ways to tackle this, even if you can't modify the server code.
> Generating Markdown per-request via Accept headers is expensive
No one's saying the markdown can't be pre-rendered.
> Pre-rendered llms.txt files can be CDN-cached
Every CDN I've used has a way to vary by Accept. You can even have it redirect to a different url, or use a <link> tag that points to a markdown file. Which might even be called "llms.txt", who knows, who cares. That's the beauty of the existing standards: they're flexible.
Good god. I'm not going to debate against an AI, so don't bother generating a reply.