Hacker News new | ask | show | jobs
by jonshea 5127 days ago
We do the simplest thing we could imagine. Our sitemap of ~50,000,000 entries is written to static xml once a week as part of a batch job and pushed to S3. Is there any reason to believe it needs to be updated near real time? How often does Google read yours?
1 comments

one thing is: freshness is a factor for google ranking. so if you could communicate a fresh page to google in exact the moment when new content arrives (+ there is a chance, that there is a "fresh" spike for search demand) then it's a factor.

but more important:

also with 50 000 000 URLs, as your site gets crawled with about 500 000 pages a day (which is average) or 1M pages a day (which is good) it takes already 50 to 100 days to index your whole site - so it makes sense to communicate only the changed sitemaps (at the exact time when they changed) to google, as the sitemaps get fetched quite fast you up your chances, that the new LP gets crawled/indexed faster. it depends on how fast your page turnaround is (new pages, updated pages, deleted pages) if it makes sense for you, or not.

(p.s.: in most cases for most business, a (near) real-time sitemap is overhead.)