Hacker News new | ask | show | jobs
by lnanek2 4679 days ago
Kind of strange the bots would spike you. Wouldn't anything on a page have been used before and thus cached on your server as a file? I've written all this stuff before as well, including overlaying site logo on images that were hotlinked to boot, but I always cached to the file system anything I had to generate the first time it was requested and simply served it after.
1 comments

Short version: We had direct access to all the large news image providers and analyzed the images for topic and story identification. We used this data to dynamically generate hundreds of thousands of pages. Way too much data to cache to file (our storage cluster was many many terabytes). We used a CDN to cache the images. When a bot would scrape, it would hit all the old pages that fell out of the cache. Perhaps our fault for having such a large sitemap.xml.