| HN Mirror

Yes, unfortunate that MXNet wasn't covered. It's in the happy Venn place of (fully cross-platform) ∩ (easy to embed) ∩ (flexible) ∩ (hackable).

* Cross platform: Windows, MacOS, Linux; CPU and CUDA. Though their CMake needs work.

* Easy to embed: straightforward C FFI, JSON for metadata and parameter serialization, no weird runtime.

* Flexible: not too specialized to vision. Static unrolling of RNNs possible now (with mirroring this can still be very memory efficient [0]), basic support for the fast new cuDNN 5 RNN layers [1] (contributed by colleague of mine). Dynamic unrolling is on the horizon I hear.

* Hackable: once you're familiar with the codebase, custom elementwise unary or binary ops = few minutes, custom layers = 1+ hours (depending on complexity). And if you can leverage mshadow primitives for your layer implementation, you don't even have to touch CUDA. Also fairly active on github, responsive to PRs etc.

[0] https://arxiv.org/pdf/1606.03401.pdf

[1] https://devblogs.nvidia.com/parallelforall/optimizing-recurr...