Hacker News new | ask | show | jobs
by mtlynch 429 days ago
>I think in the future, websites will learn to serve pure markdown to these bots instead of blocking. That way, websites prevent bandwidth overages like in the article, while still informing LLMs about the services their website provides.

Why?

There's no value to the website for a bot scraping all of their content and then reselling it with no credit or payment to the original author.

3 comments

Unless you're selling something. If you have articles praising your product/service/person and "comparison" articles of the "top 10 X 2025" (your offering happens to be number one) you want the bots to find you.

The LLM SEO game has only just begun. Things will only go downwards from here

Or technical docs. For example:

https://bun.sh/llm.txt

I love that! That's one of my biggest pain points: wrong/outdated usage of dependencies.
OP in this case is by no means the original author. In this linked post, they mentioned they scrape third parties themselves. OP's bots might not be as sophisticated, but they're still "borrowing" others' content the same way.
ChatGPT and others have some sort of attribution, where they link to the original webpage. How or when they decide to attribute is unclear. But websites are starting to pay attention to GEO (generative engine optimization) so that their brand isn’t entirely ignored by ChatGPT and others.
I do agree that LLM-as-a-search is going to likely become more and more prevalent as inference gets cheaper and faster, and people don't too much care about 'minor' hallucinations.

What I don't see however is any way this new way of searching will give back. There is some handwaving argument about links, however the entire value prop of an llm is you DON'T need to go to the source content.

could have just left it as SEO and changed the S to "Slop"