Hacker News new | ask | show | jobs
by lolpython 230 days ago
How would the links be prioritized? If the bots goal is to crawl all content would they have prioritization built-in?
1 comments

How would they prioritize things they haven't crawled yet?
It's not clear that they are doing that. Web logs I've seen from other writing on this topic show them re-crawling the same pages at high rates, in addition to crawling new pages
Actually I've been informed otherwise, they crawl known links first according to this person:

> Unfortunately, based on what I'm seeing in my logs, I do need the bot detection. The crawlers that visit me, have a list of URLs to crawl, they do not immediately visit newly discovered URLs, so it would take a very, very long time to fill their queue. I don't want to give them that much time.

https://lobste.rs/c/1pwq2g