esp. for image data libraries, why not provide the images as a dump instead? No need to crawl 3mil images if the download button is right there. Now put the file on a cdn or Google and you're golden
There are two immediate issues I see with that. First, you'll end up with bots downloading the dump over and over again. Second, for non-trivial amounts of data, you'll end up paying the CDN for bandwidth anyway.