Hacker News new | ask | show | jobs
by orangejewce 3118 days ago
Spark is not a buzzword if you do analytics, it's the preferred platform, especially over native Hadoop. Honestly most developers have no idea what the heck Spark is or how to use it. NoSQL/Kafka yes, Analytics/Machine Learning is still far too complicated for most to stand up on their own.
2 comments

Spark has a place: at large scale. For 100s of GB to a few TB of data PostgreSQL works very well. At least, it does for my team. I don't want Spark, Kafka, NoSQL or any other modern fad near my team's data. It's just not appropriate.
Kafka was useful in a project because of the semantics for what amounts to writing to file from network sockets.

Spark is a terribly inefficient solution to any known/stable data processing or analytics jobs. If you want a common format to trade with buddies, it's useful now. I expect something else will come along to replace that fad tech.

>Spark is a terribly inefficient solution to any known/stable data processing or analytics jobs.

Can you expand on that?

Spark is a terribly inefficient solution to any known/stable data processing or analytics jobs.

You gave a typo there.

Spark is a terribly inefficient solution to any known/stable data processing or analytics jobs I have ever come across.

There, fixed that for you.

I don't think that's fixed, because it's not what I assert.

Any job that is predictable, can be done faster and cheaper with some C (or Go) and ad-hoc delegation to load balanced VMs. Spark is difficult to optimize for consistent processes, poor on resource usage, and requires specialized knowledge. Just pay for someone who has done a little embedded programming and stop creating buzzword jobs because a prototype went to production without comparison.