| Our service actually allows you to push your code into the system rather than trying to pull back all of the page contents. So, you end up running your semantic analysis, image analysis, or whatever you want to do on our grid. Very specifically, you implement a processPage() function of the following form: byte[] processPage ( String url, byte[] pageContents, Object userData); (EDIT: remove code tag that didn't work...) We run your function on the contents of the pages/images/objects you want to analyze and give you back your results from the millions or billions of pages you want to analyze. The results from the processPage() function are completely free form. You serialize your results into a byte array and that's what you get back (except you get it back for all of your urls). Now, since the processPage() function is free form, you can just turn around and "return pageContents;" from your function. That will give you all of the page contents from your crawl. That's not an ideal case for us, but we can handle it. We might eventually charge a small bandwidth or storage cost for this type of usage, but we do not intend to do so for our normal use case. The bigger charge to the customer if they try to pull back all of the contents will be their local bandwidth charge. They would need to pull all of these pages' contents to their own servers. That will cost them quite a lot of bandwidth assuming they don't have their own fat pipe. In summary, $2/million-pages-crawled is our real price and is not just marketing. |