https://news.ycombinator.com/item?id=7803101
I will also add that looking in to hessian free for training over conjugate gradient/LBFGS/SGD for feed forward nets has proven to be amazing[1].
Recursive nets I'm still playing with yet, but based on the work by socher, they used LBFGS just fine.
[1]: http://www.cs.toronto.edu/~rkiros/papers/shf13.pdf
[2]: http://socher.org/