In this case, sure... that said, I've worked on a few sites where more than half the traffic was bots because the content was useful for other sites (classic car classifieds/sales site). The fact that just over half the page requests were actually search query results is what meant a lot of optimization steps in practice... Implementing a "search" database (mongodb and elastic were pretty new at the time), denormalizing a lot of the data structures on the "enterprise" SQL structures for search and display for not logged in users, etc. Heavier caching, donut caching, etc.
It was an interesting and sometimes fun part of my career. Working on a site/application that isn't necessarily a tech site, and that I have a personal interest in was pretty great... some of the pace for sales/commercial features less so, with sales making deals requiring deep integrations on impossible timelines. You learn a lot when a self-hosted site is being kicked while it's down... The cloud migration to get a better use of flexible resources, etc.
That's too funny. If true, really looking forward to the Cloudflare response here. I'm unsure how you would spin that in a way that didn't seem self-serving.
It's very clearly disclosed in the linked docs already, it says that Cloudflare Bot Protection will block it same as all other bots, unless you choose to allow it as an exception. If they didn't do it that way, people would accuse them of either bypassing their own product (possibly anticompetitive) or just having a low quality one.
So it doesn't take any action to work around other bot protections? Feels like that would be on the list of features an AI company wanting to scrape would ask for.
Cloudflare crawl respects robots.txt. It does not attempt to bypass any anti-crawling measures. If the site doesn't want to be crawled -- whether it uses Cloudflare or not -- this product will not help you crawl it.
Some sites actually want crawlers -- e.g. sites that are selling a product, documentation, etc. That's what this product is meant for.
Is this just a way to strong-arm non-cloudflarians into adopting their platform if you don't want your site crawled? It does sound like they are selling the solution to avoid their own content crawler.
fuck firecrawl. they copied my idea by showing interest in my product and then copied it, used their YC money to give it all out for free. fuck nick in particular. I'm still salty over this
"they copied my idea by showing interest in my product and then copied it". What exactly is revolutionary about Firecrawl or your product? Scraping APIs have been around for over a decade.
I was the first to return markdown and use reader mode stuff to strip irrelevant stuff. Theres copying and there's talking to the founder sounding interested to have your team copy what I did in the background. One is fair game, the other is a dick head move.
I think that is a neat idea and it sucks this happened, but how long before somebody simply saw that feature and replicated it? I'm curious, had you considered a deeper moat than that?
This is especially relevant given AI is making this kind of thing easy at an industrial scale. I think we should all be looking for alternative moats.
Sometimes timing is your moat and that's all you need. That being said I'll probably start limiting my public releases to revolve around standards I want implemented.
I'm rethinking the sources of value moats are built around. It seems like the landscape is changing and dimensions such as location, perspective, experience, and attention weigh more than they used to.
> but how long before somebody simply saw that feature and replicated it?
This is a good example. The, idk, "value store" of your org just switched from products and services to the employees who understand your process from a couple angles and can write well.
“Buy Cloudflare bot protection, otherwise it would be a shame if your site got scraped and ddos’d.”
Who is doing the scraping and ddosing? Cloudflare.