|
|
|
A Scalable Standard for Clean ECommerce Data in LLMs (Fork of Llms.txt)
|
|
2 points
by nicola_alessi
436 days ago
|
|
The Problem:
LLMs are terrible at understanding eCommerce sites. They:
Hallucinate prices/specs from messy HTML
Waste tokens on UI boilerplate (headers, popups, ads)
Struggle with real-time inventory/pricing updates Our solution: A fork of Answer.AI’s llms.txt that introduces site-llms.xml, an XML sitemap protocol for product data. Stores expose:
/site-llms.xml: Index of all product URLs
/product/123/llms.txt: Clean Markdown with specs/pricing (example in repo) Benefits: AI gets structured data instead of scraping Stores control what’s exposed (like robots.txt) Scales to millions of products (sitemap indexes supported) We’re open-sourcing this under CC BY-SA (same as sitemap protocol).
Would love HN’s thoughts: Is this the right abstraction? Could it work for non-eCommerce sites? Repo: github.com/Lumigo-AI/site-llms (stars welcome!) |
|