| It will happen and it already started to happen. It started to happen even before LLM, when google started to hide smaller personal blogs in its search result. Expectation of the monetary reward has nothing to do with it, discoverability does. Culture of creating content does not exist when people cant see what others created and know no one will see what they created. A lot of smaller open source was monkey see monkey do thing - we have seen other open source projects and wanted something like that. Likewise with tutorials, we have seen other people write cool tutorials and felt like creating own and showing it out. That is not the dynamic with LLM. You see LLM output, but original creator is hidden. And if you write your own, no one will find it. Worst, other people will tell you "LLM could have write it" in reaction ... so people wont bother. > search engines copied and indexed web pages Notably, search engines sent people toward web pages. And when search engines stopped doing that and started to copy content, those original pages started to die out. > Printing presses copied manuscripts Printing press made dissemination easier. It is an equivalent of early internet, not of LLM. > open-source software was incorporated into commercial products Commercial product using open source library had different user then the library it is using. And crucially, it is not hiding that library from the library user. > Wikipedia repackaged knowledge produced elsewhere Yes, and we collectively create less encyclopedias. They are not worth writing and checking for correctness anymore, so we don't do that all that much anymore. |
Imagine you have a 4B model and keep an equal size corpus of search stubs, small MD documents linking to hubs, feeds and search engines for millions of topics. You can use the LLM to read the stub and perform the search for you, all orchestrated locally, with greater privacy and independence. You can dis-intermediate the search chokepoint now. You can set the criteria for what to include, exclude, how to rank and present the results.
This works because good entry points for any topic change slowly over time. The construction of search stubs is trivial with existing AI agents, and can be shared as open source. A few GB for the model, a few for the search routing layer, and you got a sovereign local agent.
If this holds, access control shifts from whatever Google thinks maximizes profit to whatever the community thinks has value.