Hacker News new | ask | show | jobs
by MITSardine 215 days ago
Math isn't about memorizing closed-form solutions, but analyzing the behavior of mathematical objects.

That said, I mostly agree with you, and I thought I'd share an anecdote where a math result came from a premature implementation.

I was working on maximizing the minimum value of a set of functions f_i that depend on variables X. I.e., solve max_X min_i f_i(X).

The f_i were each cubic, so F(X) = min_i f_i(X) was piecewise cubic. X was dimension 3xN, N arbitrarily large. This is intractable to solve as, F being non-smooth (derivatives are discontinuous), you can't well throw it at Newton's method or a gradient descent. Non-differentiable optimization was out of the question due to cost.

To solve this, I'd implemented an optimizer that moved one variable at a time x, such that F(x) was now a 1d piecewise cubic function that I could globally maximize with analytical methods.

This was a simple algorithm where I intersected graphs of the f_i to figure out where they're minimal, then maximize the whole thing analytically section by section.

In debugging this, something jumped out: coefficients corresponding to second and third derivative were always zero. What the hell was wrong with my implementation?? Did I compute the coefficients wrong?

After a lot of head scratching and code back and forth, I went back to the scratchpad, looked at these functions more closely, and realized they're cubic of all variables, but linear of any given variable. This should have been obvious, as it was a determinant of a matrix whose columns or rows depended linearly on the variables. Noticing this would have been 1st year math curriculum.

This changed things radically as I could now recast my maxmin problem as a Linear Program, which has very efficient numerical solvers (e.g. Dantzig's simplex algorithm). These give you the global optimum to machine precision, and are very fast on small problems. As a bonus, I could actually move three variables at once --- not just one ---, as those were separate rows of the matrix. Or I could even move N at once, as those were separate columns. This could beat all the differentiable optimization based approaches that people had been doing on all counts (quality of the extrema and speed), using regularizations of F.

The end result is what I'd consider one of the few things not busy work in my PhD thesis, an actual novel result that brings something useful to the table. To say this has been adopted at all is a different matter, but I'm satisfied with my result which, in the end, is mathematical in nature. It still baffles me that no-one had stumbled on this simple property despite the compute cycles wasted on solving this problem, which coincidentally is often stated as one of the main reasons the overarching field is still not as popular as it could be.

From this episode, I deduced two things. Firstly, the right a priori mathematical insight can save a lot of time in designing misfit algorithms, and then implementing and debugging them. I don't recall exactly, but this took me about two months or so, as I tried different approaches. Secondly, the right mathematical insight can be easy to miss. I had been blinded by the fact no-one had solved this problem before, so I assumed it must have had a hard solution. Something as trivial as this was not even imaginable to me.

Now I try to be a little more careful and not jump into code right away when meeting a novel problem, and at least consider if there isn't a way it can be recast to a simpler problem. Recasting things to simpler or known problems is basically the essence of mathematics, isn't it?