| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by virmundi 3459 days ago
	One nice thing about Hadoop is you get free distributed apps. On my current project, we only read about 50 TB of data per run across 70ish fraud models. Some read 1.5 TB, others only 20 GB. On a single system, that kind of data reading would require some smart I\O partitioning across the various models (multiples read the 1.5 TB data, all read the same 20 GB [after 20 GB you're looking at history that expands to the 1.5 TB]). With Hadoop, even just Map Reduce and Cascading, you can spin all of that work out to multiple computers. Since they have the data copied over multiple drives on those multiple computers, the I\O and general scheduling are handled for us. In the end, it makes everything simpler. If something fails due to network hiccups or disk failures, Hadoop moves the job and starts it again.