Hacker News new | ask | show | jobs
by jamesblonde 3650 days ago
No, you're not. Google did this with their build engine (Blaze, internally - Bazel is the open-source API, lacking a distributed build platform). Google are doing this with Apache Beam (the API to Google dataflow) - releasing an API for local testing but not releasing the distributed engine.

If you have your data in a Hadoop cluster and are doing image recognition, Yahoo's Cafe on Spark is the only truly distributed engine out there. It uses MPI to share model state between executors.

1 comments

Keep in mind there's different kinds of parallelism though. If you mean model parallel, a lot of shops are doing that via RDMA as well as MPI. It depends on how you handle state though.

There's also data parallelism with parameter averaging which we've been doing in deeplearning4j for the last few years. We also support ALOT more than just images. We have the ETL pipelines (kafka etc) to go with it. Watch for a blog post from us on parallel for all (nvidia's blog) where we explain some of this.

I gave a framework agnostic view of the concepts you should consider when looking at distributed deep learning as well:

http://www.slideshare.net/agibsonccc/brief-introduction-to-d...