Hacker News new | ask | show | jobs
by abefetterman 3246 days ago
This is actually a really exciting development to me. (Note, what is exciting is the "optometrist algorithm" from the paper [1] not necessarily googles involvement as pitched in the guardian). Typically a day of shots would need to be programmed out in advance, typically scanning over one dimension (out of hundreds) at a time. It would then take at least a week to analyze the results and create an updated research plan. The result is poor utilization of each experiment in optimizing performance. The 50% reduction in losses is a big deal for Tri Alpha.

I can see this being coupled with simulations as well to understand sources of systematic errors, create better simulations which can then be used as a stronger source of truth for "offline" (computation-only) experiments.

The biggest challenge of course becomes interpreting the results. So you got better performance, what parameters really made a difference and why? But that is at least a more tractable problem than "how do we make this better in the first place?"

[1] http://www.nature.com/articles/s41598-017-06645-7

3 comments

Though this work may seem exciting, there is an existing, respected body of work available on how to mathematically structure a search over a large parameter space and how to mathematically interpret experimental responses. That body of work is a subset of applied statistics called design of experiments. It helps scientists avoid the common failures that result from doing exactly what was done here, random space exploration and non rigourous evaluation of results.

For this to be exciting I would expect some indication as to how this method extends and enhances the existing science of experimental methods and the trade offs involved with using their method. I dont see that.

It would not surprise me if high-tech companies are inventing new, useful things in this field.

In my career as first a scientist and then an engineer, I've found very few practical users of highly technical experimental design theory, and all of them were in industry. These algorithms move about intelligently along all dimensions of some search space, whereas in the lab we prefered to turn just one knob at a time.

One reason is that the algorithms are optimally seraching for "known unknowns" -- that is they assume they roughly understand the problem. The lab is a world of unkown unknowns where the more plodding, understandable protocols tend to be safer.

But in industry, some problems are of the known-unknowns type. And experiment runs can burn up seriously expensive hardware time. So it makes sense for fusion researchers and cloud-computing giants a like to invent new practical ways to optimise searches.

Besides, optimising searches is what Googlers are for.

Reading their actual paper further, it seems I read a bit too much into the original article. However, as their paper mentions:

> The parameter space of C-2U has over one thousand dimensions. Quantities of interest are almost certainly not convex functions of this space. Furthermore, machine performance is strongly affected by uncontrolled time-dependent factors such as vacuum impurities and electrode wear.

I'm not aware of DOE procedures that are robust to these types of issues, and would certainly appreciate any literature you have on the subject.

Regardless of theoretical literature, this procedure has enabled a dramatic shift in how these scientists think about their experiment. Furthermore it has enabled them to achieve results much faster than before (if you have been following Tri Alpha, it has been a real slog). Both of these are exciting to me even if they don't break new ground in the design of experiments.

Interesting. Are you able to provide some links to decent resources on this topic?
Perfect! Thanks.
As a complete outsider, I don't understand what's special about the "optometrist algorithm." As described in the Nature article it's just hill climbing using humans as the evaluation function.

Isn't it basically the same thing they were already doing but more granular?

Basically nobody was using automated gradient descent / etc because of the proclivity of these algorithms to get stuck on a boundary. The problem is the boundaries are not well defined. One example might be a catastrophic instability. If it gets triggered it has the potential to damage the machine. But the exact parameters in which the instability occurs are not well known. So with this algorithm you mix the best of both worlds: the human can guide away from the areas where we think instabilities are, the machine can do it's optimization thing. It's pretty simple overall but enables a big shift in how experiments are run.

Edit to add: these instabilities often look just like better performance on a shot-to-shot basis, which makes the algos especially tricky. Using a human we could say "this parameter change is just feeding the instability" vs "oh this is interesting go here"

I am still very skeptical that a human is really that good at avoiding the problem areas, although they might be marginally better. Plus, they don't seem to claim that anywhere in the paper, instead, they just rated shots as either "better" or"just as good", ie., a local evaluation which won't let you avoid such areas, which of course is a judgement that requires more knowledge than just the conditions in the neighborhood of the current reference.

The only thing I think that can lead someone to your conclusion is they can judge based on a host of criteria, not just a pre-defined set of criteria--may be that's what you meant. Of course, intuitively, changing your criteria midstream would lead to bias in your judgement, I'd think, but that may be the real innovation here, that is hard to do without a human judge in the mix.

> I am still very skeptical that a human is really that good at avoiding the problem areas

Why? Humans have a much richer modeling apparatus than any computer does right now. We can draw on a very large and yet almost fully tuned to reality set of possible models simultaneously. You can estimate the number of available models as whatever number of neurons you have, in combinatorial. We also have machinery for searching that entire model space simultaneously and testing against a continuous stream of megabytes of data in realtime, in order to find good fits.

Existing AIs wouldn't even know where to start. They can apply infinite models, but have no grounding in reality, and no way to choose amongst them. The AI doesn't even have an intrinsic sense of space, seeing has how it lacks a body. It's a very fast worker that can get things done when you give it very specific instructions, but it has no real ability to understand what it is doing or why it would want to do something different.

Remember that the human won't only be thinking "better" or "just as good". They almost certainly can't avoid thinking "If I say 'better' here, what direction will that drive the algorithm in?" They don't just learn how to drive the plasma, they're learning how to drive Optometrist as well.
To be clear: there are no gradients here (right?) This is just 0th order hill-climbing with a human assist.
how does one climb a hill with no gradient? [serious question]
You can climb a hill without knowing the gradient, so long as you can compare two points in terms of height. You randomly move in some direction, then compare the new point to the old point, go to whichever of them is higher, and repeat.

This sounds like what the experimenters are doing. Perhaps the GP was alluding to "first order hill climbing" as evaluating the gradient in every direction and climbing the steepest one, but the "0th order" version is also usually considered hill climbing and is better for some classes of problem.

That is exactly what they're doing. See the section on Exploratory Technique, second to last paragraph. As I said above, the possible innovation here is they can change midstream the criteria one uses to decide what is a "better shot".
is it picking a new configuration at random, or does it still have to be "close" to the last configuration?
The naive way to do calculus. Use secants to approximate the tangent. I think it's called finite difference.
Finite differences, you estimate the gradient.
Perhaps a stupid question, but why can't the whole experiment be run as a simulation?
Even if this would be the dream of a lot of theoretical physicists to replace experiments with simulations, this must not happen! Ever! Even if every complex system in the world could be simulated in reasonable time it would still require experiments to verify or falsify the simulation results. A simulation is essentially just a calculation from a model someone came up with to describe a system. In order to check how good the model is one has to check it against experimental data. Just expanding the models without experimental verification will not necessarily result in a good theoretical description. It would be like writing software without testing the components and expecting it to work correctly when you're done. There was recently an article on HN where economists were described as the astrologers of our time [1] since they do not verify their mathematical models to an extent where they can predict economical systems. This is another example where more experimental data should be considered in order to falsify certain theories.

Those are the reasons why string-theorist will not (and should not) get any Nobel price in the next decades. Since its predictions are hard to measure on those small scales there's no way of telling if the model is any good until it is compared against suitable experimental data.

[1] https://aeon.co/essays/how-economists-rode-maths-to-become-o...

Agreed. My background is philosophy, and while i rarely get into the STEM arguments. This has everything to do with inductive learning vs deductive learning. Any simulation will be run with the premises already built in, but cutting edge science is always about learning what those premises are. If we knew what they were, it'd be trivial to set up the reactor. Here we need inductive experimentation to learn how to simulate it trivially.
If you're doing science, experiments are hugely important. If you're doing engineering and you're reasonably sure that the physics guys came up with a good model, having everything in a computer would make development a lot cheaper.
Thanks, this is the best HN 'rant for the common sense' in a long time :)
I believe this is more about solving an engineering/mathematics problem, than about fundamental physics and the scientific process.
Physics is a lot more than just fundamental physics. H-Bomb designs for example get hundreds of hours of super computer time to simulate a few pounds of stuff for 1/1,000th of a second and even then they are approximations which need to be validated.
Because fusion simulations are really hard. This simulation[1] took 15 million hours of CPU time to model a cubic cm of plasma. The results were used to update 5 scalar parameters in a model.

[1] http://news.mit.edu/2016/heat-loss-fusion-reactors-0121

Does anyone have an idea about what software they use to simulate this stuff?

I'm wondering if they can even make use of the newfound GPU power or are just going ahead with ancient CPU based software because too much work has already been put in.

SpaceX is doing the best work on simulation. The adaptive multiscale work is a million times more important this moving to GPU, but of course they did that too:

https://www.nextplatform.com/2015/03/27/rockets-shake-and-ra...

Looks interesting. It looks like they are creating the CFD software for their own specific application. While that's cool and all, I doubt any other companies have the resources/motivation to write complex software from scratch, let alone underfunded postdocs.

I'm wondering about all the research that goes on in all the universities where large investments have been made on CPU based clusters. The simulation in the article you linked was run on NERSC servers, which are Cray supercomputers[1], which pretty much are Intel Xeon class servers with fancy interconnects.

So looks like it is CPU based, but I'm still interested in the software they use.

[1]: https://my.nersc.gov/nowcomputing-cs.php

It's usually highly-parallelized Fortran ran on the world's largest supercomputers, utilizing thousands of CPU cores. There are several codes like the ones in the study above (google "gyrokinetic equation solver"), and somehow more pop up year by year. So it's not a matter of sunk costs.

And yes, GPUs are increasingly being utilized, depending on the algorithm. But, GPUs aren't magic; they don't speed up every kind of problem.

The numbers are too big, and nature is hiding stuff from us.

So we can't simulate it because we don't know enough to simulate it. And even if we did know there's not enough computing power to do so.

The system is fundamentally 6^N dimensional with N~10^23.
I suppose you meant 6*N? Which is a lot better, but still intractable. And anyway, we don't exactly resolve molecules in e.g. turbulent flow simulations, yet they still take tens, even hundreds of millions of CPU-hours.
6*N, yes. Pretty bad mistake there. But yes, even if you don't model every particle and restrict yourself to "parcels" of fluid like in most simulations, you still have a very difficult problem.
Okay, this is really showing my ignorance but why 6?

You start off with 4 (3 space plus one time (ignoring 11-dimensional space-time)) and add which dimensions exactly? Can the individual interactions between wave/particles be reduced to 2 dimensions? Aren't they going to interact along the whole range of forces they exert: gravitational, weak, electromagnetic, strong?

3 dimensions for position + 3 dimensions for speed
Unless you consider quantum effects, which are probably relevant in this situation, then it's an exponentially larger problem space.
Yeah, 6^N dimensions are fun! ;)