|
|
|
|
|
by mubou
436 days ago
|
|
I don't get why people keep trying to make llms.txt happen. Use the standards that already exist. 1. sitemap.xml says /foo exists 2. LLM requests /foo with: Accept: text/markdown, text/html;q=0.9
3. Site responds with a markdown rendering of /fooDone. Alternatively, use <link rel="alternate">. This is a solved problem, and the tools that are already available are more flexible, don't require specific URLs, and aren't LLM-specific. |
|
The HTML-to-Markdown Problem Even with Accept: text/markdown, most eCommerce sites will return HTML (then converted server-side). This means:
Scripts/popups in <div> hell ("Subscribe to newsletter!" embedded in product specs)
Ad fragments ("Customers also bought...") polluting context windows
Layout cruft (header/footer markup in every response)
llms.txt files are handcrafted Markdown – no noise, just atomic product data.
Control Over Exposure Retailers want to:
Expose only approved fields (e.g., hide "Compare at $X" prices)
Sanitize dynamically (e.g., remove out-of-stock variants)
Avoid scrapers misusing their HTML endpoints
/site-llms.xml lets them curate what LLMs see, separate from human-facing HTML.
Performance at Scale For catalogs with 100K+ products:
Generating Markdown per-request via Accept headers is expensive
Pre-rendered llms.txt files can be CDN-cached
Sitemap indexes (>50K URLs) are already battle-tested
We’re not replacing sitemap.xml – we’re extending it for a specific use case where clean, pre-processed data matters more than flexibility.