|
|
|
|
|
by zargath
347 days ago
|
|
Sounds very basic, sadly. Anybody know why these web crawling/bot standards are not evolving ? I believe robots.txt was invented in 1994(thx chatgpt). People have tried with sitemaps, RSS and IndexNow, but its like huge$$ organizations are depending on HelloWorld.bas tech to control their entire platform. I want to spin up endpoints/mcp/etc. and let intelligent bots communicate with my services. Let them ask for access, ask for content, pay for content, etc. I want to offer solutions for bots to consume my content, instead of having to choose between full or no access. I am all for AI, but please try to do better. Right now the internet is about to be eaten up by stupid bot farms and served into chat screens. They dont want to refer back to their source and when they do its with insane error rates. |
|
Not to pick on you, but I find it quicker to open new tab and do "!w robots.txt" (for search engines supporting the bang notation) or "wiki robots.txt"<click> (for Google I guess). The answer is right there, no need to explain to LLM what I want or verify [1].
[1] Ok, Wikipedia can be wrong, but at least it is a commonly accessible source of wrong I can point people to if they call me out. Plus my predictive model of Wikipedia wrongness gives me pretty low likelihood for something like this, while for ChatGPT it is more random.