| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by westside1506 6323 days ago

Our service actually allows you to push your code into the system rather than trying to pull back all of the page contents. So, you end up running your semantic analysis, image analysis, or whatever you want to do on our grid. Very specifically, you implement a processPage() function of the following form:

byte[] processPage ( String url, byte[] pageContents, Object userData); (EDIT: remove code tag that didn't work...)

We run your function on the contents of the pages/images/objects you want to analyze and give you back your results from the millions or billions of pages you want to analyze.

The results from the processPage() function are completely free form. You serialize your results into a byte array and that's what you get back (except you get it back for all of your urls).

Now, since the processPage() function is free form, you can just turn around and "return pageContents;" from your function. That will give you all of the page contents from your crawl. That's not an ideal case for us, but we can handle it. We might eventually charge a small bandwidth or storage cost for this type of usage, but we do not intend to do so for our normal use case.

The bigger charge to the customer if they try to pull back all of the contents will be their local bandwidth charge. They would need to pull all of these pages' contents to their own servers. That will cost them quite a lot of bandwidth assuming they don't have their own fat pipe.

In summary, $2/million-pages-crawled is our real price and is not just marketing.

1 comments

jlees 6323 days ago

That's pretty cool. Thinking aloud then, if I wanted to say pull out all the adjectives from results matching $foo, I'd end up getting that data back and then have to pipe that into storage myself - costing me both bandwidth in and bandwidth out. Thought about cutting out the middleman and letting people write to S3 direct? (Yes, I have no idea how complicated this might be.)

link

jdrock 6323 days ago

Hey - I work for 80legs as well so thought I'd chime in and answer this question (westside is grabbing some food). We have thought about offering easy integration with AWS, but we'd probably implement this at a later time if we decided to go that route.

link