| Because nowdays more than ever content you need is in silos. Your facebooks/twiters/instagram/stack overflow/reddit ... And they all have limited expensive api's, and have bulk scrapping detection.
Sure you can clobber together something that will work for a while, but you can't runn a buissness on that. Aditionaly most paywalled sites (like news) explicitly whitlist google and bing, and if someone cretes new site, they do the same. As an upstart you would have to reach out to them to get them to whitelist you. and you would need to do it not only in USA but globaly. Anothe problem is cloudflare and other cdns/web firewalls, so even trying to index mom and pops blog site could be problematic. An d most of the mom and pop blogs are nowdays on som ploging platform that is just another silo. Now that i think about it, cloudflare might be in a good position to do it. The AI hype and scraping for content to feed the models have increased dificulty for anyone new to start new index. |
The decentralized nature of the internet was amazing for businesses, and monopolization could ruin the space and slow innovation down significantly.