Hacker News new | ask | show | jobs
by rickette 20 days ago
Does any of the LLM providers actually use llms.txt?

If I remember correctly this "standard" was setup by someone but without involvement of any of the major AI players.

3 comments

I can definitively say llms.txt is not used by any AI players. I run a blogging platform with around 80k blogs and /llms.txt is not requested by anything (other than humans checking to see if there's an llms.txt path).

All regular pages are aggressively scraped to the extent it's a problem I have to consistently manage, but not llms.txt.

Amazing, I didn't know.

So it get even stranger, I am the only one reading those /llms.txt ...

I'm seeing quite a bit of request for these on my work's GitBook documentation site.

But perhaps these are developers specifically targeting these pages to feed whatever LLM they are using.

How is a static blog being scraped a problem? Do you not use a CDN?
> a blogging platform with around 80k blogs

But nah, I'm sure OP doesn't know about CDNs.

Are all blogs static though?
Very few blogs require frequent updates. Even with user comments.
> I can definitively say llms.txt is not used by any AI players.

  https://developers.openai.com/llms.txt
  https://docs.anthropic.com/llms.txt
  https://geminicli.com/llms.txt
  https://github.com/llms.txt
  https://docs.aws.amazon.com/llms.txt
  https://openrouter.ai/docs/llms.txt
OP clearly meant that the AI players are not reading and/or honouring llms.txt of other websites when scraping.
i stand corrected, but what was clear to you, obviously was not clear to me.
No, requesting "Accept: text/markdown" in the headers and returning markdown is the more agreed upon standard at this point.[0]

[0] - https://acceptmarkdown.com/

Now, it would be super cool to get markdown and zero javascript bundles…
If you want to see what that looks like, I one-shot a browser with Claude that does it[0]. Docs pages are early adopters to this[1][2], so that AI agents can better handle tasks.

[0] - https://github.com/solumos/md-browse

[1] - https://docs.stripe.com

[2] - https://vercel.com/docs

I just found out Cloudflare supports real-time html to md conversion [0]

- [0] https://blog.cloudflare.com/markdown-for-agents/#convert-htm...

This is interesting. I should start incorporating this -- it couldn't hurt to do both.
yes, they do.

anyone who's, even slightly, clued into how agents access documentation, has been making changes to their pages. ex: https://searchtxt-web.fly.dev/search?q=aws