Hacker News new | ask | show | jobs
by nate_martin 3694 days ago
Maybe someone who works on deep learning could comment on what this provides vs other open source systems like theano, tensorflow, torch, etc.
3 comments

They claim it's twice as fast as tensorflow, which is not blow-you-out-of-the-water (compare to like 50x speedup from GPU on most places), but it's a solid speedup.

It's easily parallelizable on GPU's, or so the claim goes.

Its configuration language is much, much shorter than caffe's, but upon inspection it looks like that the configuration language is also much less flexible than caffe's and they implemented a damn sight less stuff. No recurrent anything, for example, or LSTM, no gating stuff that you would need if you were doing LSTM, no residual net stuff, just off the top of my head.

It looks like much, much less complete docs in comparison to TF and Theano and things. Note the probability of dropout given in the user docs, but the actual documentation for dropout feature is hidden away inside the repo.

The important thing, however, is that they claim that there's a significant improvement on doing training on extraordinarily sparse datasets, like recommender systems and things like that. It seems very specialized for that specific exact purpose: see only accepting NetCDF format data, which is common enough in climatology-land but less common in machine learning-land proper.

The test coverage... To a first approximation, there is no test coverage. It seems quite research project-y.

One important difference is model-parallel training. From the FAQ:

DSSTNE instead uses “model-parallel training”, where each layer of the network is split across the available GPUs so each operation just runs faster. Model-parallel training is harder to implement, but it doesn’t come with the same speed/accuracy trade-offs of data-parallel training.

https://github.com/amznlabs/amazon-dsstne/blob/master/FAQ.md

They claim to perform much better on sparse data sets. "DSSTNE is much faster than any other DL package (2.1x compared to Tensorflow in 1 g2.8xlarge) for problems involving sparse data". It also has good support for distributing the computation over multiple GPUS. Theano for example can't do anything like that. On the other hand using JSON to design my models sound much worse than using a programming language.