Hacker News new | ask | show | jobs
by scott_s 3347 days ago
This is a good paper, but not quite how I think about it. I use the terms data-parallel (for SIMD), task-parallel (for fork-join; kinda) and message passing. GPUs are basically data-parallel machines, but over the years, GPUs have been getting more and more capable, so I imagine some people out there are using them for task-parallel workloads.
1 comments

Would tensorflow (or similar) count as task-parallel because the computation graph is a DAG? If so, there's a pretty popular example of task-parallelism running on GPU's.
I would say TensorFlow is a hybrid of two strategies: SIMD and dataflow/DAG. (I wouldn't say fork-join and dataflow/DAG are synonymous; rather they are related but different models/APIs).

At the level of a single node, TensorFlow uses Eigen [1]. Eigen is like BLAS, but it's a C++ template library rather than Fortran. It compiles to various flavors of SIMD. Nvidia's proprietary CUDA is the SIMD flavor most commonly used by TensorFlow programs.

At the level of multiple nodes, TensorFlow derives a program graph from your Python source code, using high level "ops", in the style of NumPy. Then it distributes the ops across a cluster using a scheduler:

Quote: Its dataflow scheduler, which is the component that chooses the next node to execute, uses the same basic algorithm as Dryad, Flume, CIEL, and Spark. [2]

Python is the "control plane" and not the "data plane" -- it describes the logic and dataflow of the program, but doesn't touch the actual data. When you use NumPy, the C code and BLAS code are the data plane. When you use TensorFlow, the Eigen and GRPC/protobuf distribution layer are the data plane.

So you can have a big data dataflow system WITHOUT SIMD, like the four systems mentioned in the quote. And you can have SIMD without dataflow, i.e. if you are doing it in pure Eigen or procedural/functional R/Matlab/Julia on a single machine. Languages like R and Julia may have dataflow extensions, but they're single-threaded/procedural by default as far as I know.

A mathematical way to think of the DAG model is where you program uses a partial order on computations rather than a total order (the procedural model) -- this is what give gives you parallelism.

So TensorFlow uses both SIMD and dataflow.

[1] http://eigen.tuxfamily.org/index.php?title=Main_Page

[2] http://download.tensorflow.org/paper/whitepaper2015.pdf

Good point! Which reminds me that I left off pipeline-parallelism, which is very common in dataflow programming models. And Tensorflow is a dataflow model. But I think that the core computation in Tensorflow programs will tend to be largely data-parallel affairs. That is, I think such programs tend to have a bunch of data-parallel computational kernels connected in a DAG. When I made that comment, I was thinking more of a Cilk style program.

(I work on a dataflow language and system.)

I'd say no, since the purpose of the GPU there is to make the matmul really fast.