| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jen20 3821 days ago
	Has the author (if they are reading here) considered using Joyent's Manta to take the processing to the data instead?

4 comments

vgt 3821 days ago

There are plenty of architectures that do exactly this. EMR-on-S3, Google Dataproc on GCS, Snowflake-on-S3, BigQuery-on-GCS, etc etc.

The bigger point in the article is that these exact "take processing to the data" architectures operate exceedingly well on S3, GCS, Azure.

And, as a biased observer, these architectures operate on GCS the best due to great performance measured in the article, quick VM standup times, low VM prices, and per-minute billing.

link

zbjornson 3821 days ago

I'm still trying to parse the docs and Manta source code to see what it actually does, but it seems unique if the data storage nodes are also the data processing nodes and no data transfer happens from some storage service before the job begins. The other key factor is having neither startup time nor the cost of a perpetually running cluster. Per my comment below [1], we have used Lambda with S3 to get something like this, as well as our own architecture built on plain EC2/GCE nodes.

[1] https://news.ycombinator.com/item?id=10846514

link

qaq 3820 days ago

Not only that but the thing is built by guys who really know what they are doing like Bryan Cantrill and other former SUN top people.

link

vgt 3821 days ago

got it. thanks!

link

justinsaccount 3821 days ago

As you sure you understand what "take the processing to the data" means?

EMR-on-S3 is the "copy the data to the processing nodes" variety.

link

linc01n 3821 days ago

I think Manta is better if the result set is smaller than input set. So network performance won't matter that much. And also a per second pricing is better since the author need the result in 10 seconds.

Spinning up a cluster of VMs and use 10 seconds and they charge you min. 1 hour seems expensive to me.

link

dharbin 3821 days ago

I don't know about Manta, but this is the entire point of HDFS. It easier to move code than data.

link

zeristor 3820 days ago

Indeed, but they're having such fun. Let's leave them be.

link

zbjornson 3821 days ago

Hadn't heard of it, looks cool. Thanks for the tip :)

link