| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nicola_alessi 436 days ago

You're absolutely right that existing standards like sitemap.xml + Accept headers could work in theory, but here's why we built this for eCommerce specifically:

The HTML-to-Markdown Problem Even with Accept: text/markdown, most eCommerce sites will return HTML (then converted server-side). This means:

Scripts/popups in <div> hell ("Subscribe to newsletter!" embedded in product specs)

Ad fragments ("Customers also bought...") polluting context windows

Layout cruft (header/footer markup in every response)

llms.txt files are handcrafted Markdown – no noise, just atomic product data.

Control Over Exposure Retailers want to:

Expose only approved fields (e.g., hide "Compare at $X" prices)

Sanitize dynamically (e.g., remove out-of-stock variants)

Avoid scrapers misusing their HTML endpoints

/site-llms.xml lets them curate what LLMs see, separate from human-facing HTML.

Performance at Scale For catalogs with 100K+ products:

Generating Markdown per-request via Accept headers is expensive

Pre-rendered llms.txt files can be CDN-cached

Sitemap indexes (>50K URLs) are already battle-tested

We’re not replacing sitemap.xml – we’re extending it for a specific use case where clean, pre-processed data matters more than flexibility.

1 comments

mubou 436 days ago

Come on, this is clearly copy-pasted from chatgpt. You can plainly see where the headings and bullet points were. And it's just rehashing the benefits of markdown, anyway, which isn't relevant. Did you even bother to read this slop?

> Accept: text/markdown, most eCommerce sites will return HTML

Which means the site doesn't have markdown. So, add it? There are plenty of ways to tackle this, even if you can't modify the server code.

> Generating Markdown per-request via Accept headers is expensive

No one's saying the markdown can't be pre-rendered.

> Pre-rendered llms.txt files can be CDN-cached

Every CDN I've used has a way to vary by Accept. You can even have it redirect to a different url, or use a <link> tag that points to a markdown file. Which might even be called "llms.txt", who knows, who cares. That's the beauty of the existing standards: they're flexible.

Good god. I'm not going to debate against an AI, so don't bother generating a reply.