Hacker News new | ask | show | jobs
by adileo 763 days ago
Absolutely spot on. Additionally, it's worth mentioning that a lot of content is now locked behind a few major platforms (eg. Facebook, LinkedIn, Medium, YouTube, etc.) or CDNs like Cloudflare, which often block crawling from non-Google IPs or well-known search engines.

While the other costs mentioned here can be optimized with current hardware prices and a good database, anti-crawling measures necessitate thousands of IPs/proxies, making the process even more challenging and costly.

1 comments

Additionally, it's worth mentioning that a lot of content is now locked behind a few major platforms (eg. Facebook, LinkedIn, Medium, YouTube, etc.) or CDNs like Cloudflare, which often block crawling from non-Google IPs or well-known search engines.

I think this is fine. If I want to find something on one of those big sites I just go there directly. However if I want to search the web for a site I’ve never been to before then I’m stuck with the bad results of the current search offerings. It’s quite depressing!