|
|
|
|
|
by threeseed
2899 days ago
|
|
I've been using Spark for many years going back to 1.0. It is the foundation technology of almost every data science team around the world. And your misguided post (which for some weird reason only focuses on graph algorithms) doesn't change that. And not sure why you think it's inefficient. We run 30 node, autoscaling clusters which stay close to 100% for most of the time. |
|
I have exactly zero knowledge on Spark's efficiency as well as zero on how representative graph algorithms are, but I can confidently say that the above statement fails to refute the referenced article's thesis (which, arguably, criticizes assertions just like that).
Just because your implementation scales (even autoscales) to use more compute resources says nothing about its efficiency (overall or even marginal when adding more nodes, i.e. the shape of the curve).
Computer science has struggled with achieving even near linear-scalability ever since the advent of SMP.