Hacker News new | ask | show | jobs
by ChrisRackauckas 2162 days ago
Yes, I think this is a great use for neural networks since they are effectively high dimensional function approximators, and something like Schrondinger's equation is a PDE where the number of dimensions is the number of observables so it can get very high dimensional very fast. Classical methods don't necessarily scale that well in high dimensions (curse of dimensionality: cost is exponential in dimensions), but using neural networks does very well. This gives rise to the physics-informed neural network and deep backwards stochastic differential equation approaches which will likely be driving a lot of future HPC applications in a way that blends physical equations with neural network approaches. We recently released a library, NeuralPDE [1], which utilizes a lot of these approaches to solve what were traditionally difficult equations in an automated form. I think the future is bright for scientific machine learning!

[1] https://neuralpde.sciml.ai/dev/

3 comments

This is fascinating. ELI5: how does this work? (I'm couldn't find references on the linked site)

Let's say I supply a high-dimensional DAE, f(x', x, z) = 0, x(0) = x₀, where classical methods like quadrature are unwieldy. Does the algorithm generate n samples in the solution space by integrating n times and then fitting an NN? With different initial conditions? Or does it perform quadrature with NNs instead of polynomial basis functions?

A lot of these methods here utilize the universal differential equation framework described here: https://arxiv.org/abs/2001.04385 . Specifically, the last example in this preprint describes how high dimensional parabolic PDEs can be solving using neural networks inside of a specific SDE (derivation in the supplemental). Discrete physics-informed neural networks also are a subset of this methodology.

The other subset of methods, continuous physics-informed neural networks, are described in https://www.sciencedirect.com/science/article/pii/S002199911... .

For a very basic introduction, I wrote some lecture notes on how this is done for a simple ODE with code examples: https://mitmath.github.io/18S096SciML/lecture2/ml

These methods are really interesting for high-dimensional PDE (like HJB), but there's a ton of skepticism about the applicability of NN models for solving the more common PDE that arise in physical sciences and engineering.

The tests are rarely equivalent, in that standard PDE technology can move to new domains, boundary conditions, materials, etc., without new training phases. If one needs to solve many nearby problems, there are many established techniques for leveraging that similarity. There is active research on ML to refine these techniques, but it isn't a silver bullet.

Far more exciting, IMO, is to use known methods for representing (reference-frame invariant and entropy-compatible) constitutive relations while training their form from observations of the PDE, and to do so using multiscale modeling in which a fine-scale simulation (e.g., atomistic or grain-resolving for granular/composite media) is used to train/support multiscale constitutive relations. In this approach, the PDEs are still solved by "standard" methods such as finite element or finite volume, and thus can be designed with desired accuracy and exact conservation/compatibility properties and generalize immediately to new domains/boundary conditions, but the trained constitutive models are better able to represent real materials.

A good overview paper on ML in the context of multiscale modeling: https://arxiv.org/pdf/2006.02619.pdf

Yes, and our recent work https://arxiv.org/abs/2001.04385 gives a fairly general form for how to mix known scientific structural knowledge directly with machine learning. In fact, some of these PDE solvers are just instantiations of specific choices of universal differential equations. I agree that in many cases the "fully uninformed" physics-informed neural network won't work well, but we need to fully optimize a library with all of the training techniques possible in order to prove that, which is what we plan to do. In the end, I think PINNs will be most applicable to (1) non-local PDEs where classical methods have not fared well, so things like fractional differential equations, and (2) very high dimensional PDEs, like 100's of dimensions, but paired with constraints on the architecture to preserve physical quantities and relationships. But of course, something like a fractional differential equation is not an example for the first pages of tutorials since they are quite niche equations to solve!
You've got a lot of broken references (??) in that preprint, BTW.

I think I understand why you're putting in the learned derivative operator, but I think it's rarely desirable. Computing derivatives with compatibility properties is a well-studied domain (e.g., finite element exterior calculus), as is tensor invariance theory (e.g., Zheng 1994, though this subject is sorely in need of a modern software-centric review). When the exact theory is known and readily computable, it's hard to see science/engineering value in "learned" surrogates that merely approximate the symmetries.

More generally, it is disheartening to see trends that would conflate discretization errors with modeling errors, lest it bring back the chaos of early turbulence modeling days that prompted this 1986 Editorial Policy Statement for the Journal of Fluids Engineering. https://jedbrown.org/files/RoacheGhiaWhite-JFEEditorialPolic...

>When the exact theory is known and readily computable, it's hard to see science/engineering value in "learned" surrogates that merely approximate the symmetries.

I completely agree, which is why the approach I am taking is to only utilize surrogates to think which are unknown or do not have an exact theory. I don't think surrogates will be more efficient than methods developed that exploit specific properties of the problem. In fact, I think the recent proof of convergence for PINNs simultaneously demonstrates this might be an issue (there was no upper bound to the proved convergence rate, but the one they could prove was low order).

>More generally, it is disheartening to see trends that would conflate discretization errors with modeling errors, lest it bring back the chaos of early turbulence modeling days that prompted this 1986 Editorial Policy Statement for the Journal of Fluids Engineering. https://jedbrown.org/files/RoacheGhiaWhite-JFEEditorialPolic....

Agree, this is a difficult issue with approaches that augment numerical approaches with data-driven components. There are ways to validate these trained components independent of the training data (i.e. by using other data), but validation will always be more difficult.

Cool example, thanks!
this is very cool.

I was thinking specifically of this and related approaches https://arxiv.org/abs/1909.08423 where they search for the ground state by iteratively using an MCMC sampler and doing SGD. The innovation is a network architecture that takes classic approaches from physics and judiciously replaces parts with flexible NNs.

I had not even considered how things might work if you actually want to think about time.

Do you know if anybody has been running this NN+DiffEq solver stuff on big HPC systems that also have GPUs? If you know of any papers where they tried this, would be interesting to look at.

I see a Poisson solver in the docs.

Is there a paper comparing the performance of this particular solver against the state of the art ?

(if you are using GPUs, the AmgX library has a finite-difference solver for Poisson in their examples - very far from the state of the art, but a comparison might put performance in perspective)