|
|
|
|
|
by hcrisp
3804 days ago
|
|
Impressive, but it seems an inversion of paradigms. Small data to compute ratios is usually associated with high performance computing (HPC). Why use Spark when the data is small and is broadcast to each worker? You have to pay the serialization-deserialization penalties of moving the data from Python to JVM and back again. In fact the JVM isn't really needed here at all since all the computation is done in the pure-Python workers in an embarrassingly-parallel way. Seems to me that you would just move onto an HPC and use TensorFlow within a IPython.parallel paradigm and be done much sooner. |
|