| The major problem with Brave search is their position about indexing and licensing content against the wishes of the website publisher. Their robot does not identify itself, meaning the publisher cannot use the standard robots.txt to block its crawling if the publisher so wishes. Incidentally, the robots.txt file has been used in court cases litigating if a search engine is legal or not. Even worse, they state that Brave search won't index a page only if other search engines are not allowed to index it. It is morally not their right to make that call. A publisher should have full control to discriminate which search engine indexes the website's content. That's the very heart of why the Robots Exclusion Protocol exists, and Brave is brazenly ignoring it. Even worse than that, the Brave search API allows you (for an extra fee) to get the content with a "license" to use the content for AI training? Who allowed them the right to distribute the content that way? I wrote about all this here: https://searchengineland.com/crawlers-search-engines-generat... and more references elsewhere in this thread: https://news.ycombinator.com/item?id=36989129 Amusingly, while I was writing my article, this got posted to their forums, asking about how to block their crawler: https://community.brave.com/t/stop-website-being-shown-in-br... No reply so far. |
If you post something to the open web, what's it to you who reads it and how? You can block some IPs but that's about it.
I don't know if Brave has a knowledge graph - if they do, I would understand objecting if they filled it in with “stolen” content. But I don't see what's the problem with search.
By the way, isn't everyone's favourite archive.is doing the same thing?
I have no strong opinion on this, curious to hear counter arguments.