| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ap22213 3670 days ago
	I think the missed point is that Spark is very easy. I can get an average Java or Python developer trained up on it in less than a day. The python shell is very simple to use out of the box. And, it's incredibly convenient to be able to either run locally or on a huge cluster. I can use the same code to easily process batch jobs from 1 MiB to 100 TiB. In my mind, it's just a cost savings. Developer time is expensive, and it's hard to find great developers. Hardware is cheap. No way am I a scalability expert, and I really don't have time to be one. I started using Spark when I had to sort 10 TiB on disk, and it scored the highest on sorting performance. I struggled with implementing a fast disk sort quickly, and I gave Spark a whirl, and it fixed my problem, fast. Since then, I've found it useful in a lot of other ways.