Hacker News new | ask | show | jobs
by Gladdyu 2743 days ago
You could express an arbitrary (well-behaved) ODE as a neural network by discretizing it in timesteps, but you would not extract any additional parallelism.

In normal ODEs you can already compute each X_{i}(t)/dt in parallel, but you will still have to evaluate layer/time t completely before evaluating layer/time t+Dt as it feeds forward in both the NN and ODE case.

3 comments

You could, but it doesn't work very well. It's very very slow. We built a package for it in Julia to test it out, but it wasn't competitive against any of our other methods. Our writeup is here:

https://julialang.org/blog/2017/10/gsoc-NeuralNetDiffEq

I think it's premature to write off using neural networks to accelerate solving ODEs based on this study. There are quite a few ways to formulate the problem and there's been some promising work in this area recently, e.g., using neural networks to approximate subgrid processes in climate models: https://www.pnas.org/content/115/39/9684
What you show is something completely different though. Using neural networks to automatically learn some way to formulate a model? That sounds reasonable. Using neural networks to solve a given ODE? That doesn't seem to work well in the cases we tried.
How would you define “using neural networks to solve a given ODE”?

I’ll certainly agree that it doesn’t make sense to use a single neural net like function to model the full solution of an ODE. But the entire power of deep learning is that it doesn’t force you to use a single approach — you can compose neural nets with any function you like as long as it’s differentiable. So I think hybrid models that blend deep learning with traditional numerical methods are entirely fair game.

>I’ll certainly agree that it doesn’t make sense to use a single neural net like function to model the full solution of an ODE.

That's exactly how I'm defining using neural networks to solve a given ODE, and yes our studies show that it's not a practical method. It was just a good Google Summer of Code where we coded up a version in TensorFlow, the student did the same thing in KNet.jl, and then we played around with a bunch of modifications (to the error function, allowing adaptive training, etc.) to end up convincing ourselves this wasn't a viable method. However, the next steps we are doing are blending deep learning with traditional numerical methods. A post earlier up shows that our differential equation software now blends with the deep learning software, and we have a few projects investigating different strategies for actually using these combinations. We should have results going onto Arxiv late January showing some promising mixed strategies.

Multiple-shooting methods [1] for ODEs are amenable to parallelism in time.

[1] https://en.wikipedia.org/wiki/Direct_multiple_shooting_metho...

Yes and no: there is parallel time integration for ODEs which parallelizes using multigrid techniques.
Yes, but tests tend to show you need like >64 cores for things like parareal methods to do better than standard serial methods. So they exist, but aren't quite practical yet. Maybe a GPU-based one in specific cases can be interesting, but no one has been able to demonstrate an efficient enough code yet. It's definitely an interesting topic.
Thanks for the perspective on the actual performance of these algorithms. So yeah, as an algorithmic fact it isn't necessary to proceed sequentially, though perhaps not performant to proceed in parallel either.

Anyway, I heard AMD has 64 cores on a single chip so it may not be too long..

Neuromorphic Supercomputer With 1 Million Cores Mimics the Human Brain

  https://www.tomshardware.com/news/human-brain-neuromorphic-supercomputer-manchester,38027.html
How do they cope with the interactions at the boundaries of the grid?