| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by x0x0 4210 days ago
	I've used hadoop at petabyte scale (2+pb input; 10+pb sorted for the job) for machine learning tasks. If you have such a thing on your resume, you will be inundated with employers who have "big data", and at least half will be under 50g with a good chunk of those under 10g. You'll also see multiple (shitty) 16 machine clusters, any of which -- for any task -- could be destroyed by code running on a single decent server with ssds. Let alone hadoop jobs running in emr, which is glacially slow (slow disk, slow network, slow everything.) Also, hadoop is so painfully slow to develop in it's practically a full employment act for software engineers. I imagine it's similar to early ejb coding.

3 comments

sedachv 4210 days ago

> Also, hadoop is so painfully slow to develop in it's practically a full employment act for software engineers.

It's comical how bad Hadoop is compared even to the CM Lisp described in Daniel Hillis' PhD dissertation. How do you devolve all the way from that down to "It's like map/reduce. You get one map and one reduce!"

link

JabavuAdams 4210 days ago

Programming is very faddish. It's amazing how bad commonly used technologies are. I'm so happy I'm mostly a native developer and don't have to use the shitty web stack and its shitty replacements.

link

sedachv 4209 days ago

What really puzzles me is that Doug Cutting worked at Xerox PARC and Mike Cafarella has two (!) CS Masters degrees, a PhD degree, and is a professor at the University of Michigan. It's not like they were unaware of the previous work in the field (Connection Machine languages, Paralations, NESL).

link

jbergens 4208 days ago

It sounds a little bit like BizTalk :-)

link

IndianAstronaut 4210 days ago

Imho, hadoop is only for 100tb or more. Anything less can be easily handled by other tools.

link