| I hope you don't expect me to respond to every statement of yours in that mountain of text. This response of mine has already grown too long. You don't have to convince me that mathematics is important--my background is in pure mathematics. I just don't think CS people are hapless fools who need applied math people like you from the outside to help them out. In my book, the non-systems part of CS is already a part of mathematics. Since you bring it up, I've always considered numerical methods as firmly a part of applied mathematics. Combinatorial search and optimization problems are in the heartland of CS. The simplex method straddles numerical optimization and combinatorial optimization and is a bit of a hermaphrodite. Low-dimensional LPs are important in computational geometry and that field has produced some beautiful algorithms. Here's Seidel's randomized algorithm: Throw out a random hyperplane and solve the smaller problem recursively. If this optimum satisfies the excluded constraint then we are done. Otherwise the true optimum must lie on the hyperplane, so we can reduce the dimension of the problem by 1. > Net, without the math, there is little reason to take the algorithm seriously. An algorithm _is_ mathematics. The reason I said you seemed ignorant of computer science is this apparent belief that computer science is a random bunch of algorithms without any supporting theory. > Well, to take the algorithm seriously, we need more than just the algorithm Your premise is based on a strawman. How do you think algorithms are designed? They aren't generated randomly. They are generated based on exactly the kind of insight that generates theorems and proofs. There may not be tight proofs of every property we'd like to know for certain. That is certainly true for the simplex method! Its worst-case running time is exponential (e.g. Klee-Minty), and there is a small cottage industry devoted to proving its running time is polynomial in the average case (for various definitions of 'average case'). People have been using the simplex method very effectively since its inception despite knowing that its pathological behavior can be very bad indeed. Had you wanted to make your point more effectively, you should have picked interior point methods, which have the additional benefit of working for a much wider class of convex problems like SDPs. > Claim: If z is bounded above on F, then z achieves its least upper bound. Note: The proof is not just usual analysis with converging sequences from Rudin's 'Principles'. This has nothing to do with linearity or convexity. It's true for any continuous function f : X -> R on a closed subset F of a complete topological space X. If f is bounded above on F then its supremum on F is achieved at some x in X. By the supremum property and continuity of f, you can find a net in F which accumulates around x. This Cauchy net converges to x since X is complete. Because F is closed it contains its own accumulation points. Hence x is in F. This proof is exactly at the level of Baby Rudin if you drop back the generality a bit and work with metric spaces. > For us to take the simplex algorithm seriously, we need to know that this claim is true. Nonsense. Any implementation of the simplex method is already going to introduce round-off error, so the theoretical difference between optimizing on a set and its closure has no practical consequences. If your point is that the achievement of the supremum ensures termination of the simplex method, that is false. The simplex method is hill climbing on a polytope's vertex graph. Whenever there is a local tie between neighboring vertices you need a tie-breaking pivoting rule. Bad pivoting rules (e.g. lexicographic ordering) will lead to cycling and non-termination. In the absence of such local ties, all you need to know to prove termination is that the number of vertices is finite and that the objective function strictly decreases each step. Even the proof of termination when pivoting with Bland's rule is purely combinatorial. An example of something that's actually important in practice is how strong Lagrange duality supplies non-heuristic stopping criteria for convex programs via dual feasible points. That way we know what we're giving up by being suboptimal. Duality theory is also much richer and subtler than the kind of mindless definition-chasing freshman homework problem you bought up above. |
Would, could, and should be, but so far in practice in the universities and elsewhere very much is not.
> Combinatorial search and optimization problems are in the heartland of CS.
No: They are in the "heartland" of optimization, 'operations research', and applied math. The CS people aren't good enough with the theorems and proofs to make progress with the math. E.g., not nearly enough CS profs went through a careful theorem proving course in abstract algebra or through Rudin's 'Principles'.
> An algorithm _is_ mathematics.
Well, an algorithm is necessarily mathematically something, but by itself what it is mathematically is unknown and, thus, not in any very meaningful sense mathematics.
I gave a definition of an 'algorithm': Again, yet again, just look at the code or pseudo-code, and you don't have much. To take such code seriously, need some logically prior math, some actual math, as in a math department and in the tradition of von Neumann, Halmos, Rudin, Birkhoff, Bourbaki, etc. CS tries hard to avoid such math and, thus, is stuck in making progress in algorithms.
> Your premise is based on a strawman. How do you think algorithms are designed? They aren't generated randomly. They are generated based on exactly the kind of insight that generates theorems and proofs.
No: Big themes now in CS are to come up with algorithms by whatever means -- genetic, intuitive, heuristic, 'clustering', neural network fitting, 'rules', 'machine learning', 'artificial intelligence', etc. -- often with no "insight" at all and, in particular, and most significantly, with no reason to take the algorithm at all seriously.
> there is a small cottage industry devoted to proving its running time is polynomial in the average case (for various definitions of 'average case').
K. H. Borgwardt.
Empirically the running time on usual problems has long been known to be about 3m iterations for a problem with m constraints.
> Had you wanted to make your point more effectively, you should have picked interior point methods, which have the additional benefit of working for a much wider class of convex problems like SDPs.
You are missing my point: I'm using the problems in the top 10 list, optimization, digital filtering, etc. just as sources of examples of my point. Again, yet again, just for you, one more time, please actually read it this time, my point, already repeated over and over and over, and here repeated yet again, about algorithms, and having nothing to do with optimization, is:
"For reasonably complicated problems, the key is some applied math, and we take a corresponding algorithm seriously only because of the logically prior applied math."
Here I said "For reasonably complicated problems" but said nothing, zip, zilch, zero, about optimization or digital filtering. My claim holds for algorithms, all algorithms, ALL of them, for whatever purposes or problems, in the full generality of algorithms "for reasonably complicated problems".
Why? Again, yet again, with just an algorithm, all we have is just the code, and for any very complicated algorithm that means next to nothing down to nothing at all. This point is the same for simplex for linear programming as it is for interior point methods for achieving the Kuhn-Tucker conditions or for anything else complicated.
So, again, yet again, given the algorithm, just the code, how to take it "seriously"? There ain't but just two ways: First, can evaluate the algorithm just empirically by running it on a lot of data. This way was long ago adopted essentially in whole by the entire CS 'artificial intelligence' community. Second, can have something 'logically prior' that do take seriously because it has theorems and proofs. For this second way, CS is stuck-o because they are not good enough with theorems and proofs because they didn't take the right courses in the math department in grad school.
You find another way, another means, another 'paradigm', for taking an algorithm seriously, and I will jump for joy, but I'm not holding my breath until then.
> This has nothing to do with linearity or convexity. It's true for any continuous function f : X -> R on a closed subset F of a complete topological space X. If f is bounded above on F then its supremum on F is achieved at some x in X. By the supremum property and continuity of f, you can find a net in F which accumulates around x. This Cauchy net converges to x since X is complete. Because F is closed it contains its own accumulation points. Hence x is in F.
You typed too fast, way too fast. You are flatly, badly, easily, trivially, clearly wrong!
Counterexample: Let R denote the set of real numbers and let x, y be in R and x, y > 0. Then the problem is to find x to minimize y subject to y >= 1/x.
Then y is bounded below but does not achieve its greatest lower bound, which is 0. Done.
Why? There is no compactness. In your argument, you omitted the assumption of compactness. And as we know for positive integer n, in Euclidean R^n, a subset is compact if and only if it is closed and bounded. Yes, I made As in the course in Rudin's 'Principles'!
The set I illustrated
{ (x, y) | x, y > 0 and y >= 1/x }
is closed in the usual topology for R^2 but not bounded.
So, your argument based on Rudin style convergence does not establish my linear programming
Claim: If z is bounded above on F, then z achieves its least upper bound.
Since F is an intersection of finitely many closed half spaces, F is closed. But F need not be bounded. Still my claim holds even when the feasible region F is unbounded. So, proofs need to make some crucial use of linearity, that is, that the function to be maximized is linear and the half spaces are from linear functions.
Beyond Bland's proof (which really came from some of his work in matroid theory), can also establish the results for linear programming and simplex by some careful use of 'separation theorems' or, if you will, theorems of the alternative. But, again, linearity is crucial.
We very much do need to know the properties I listed of the simplex algorithm and linear programming.
If you are concerned about floating point, then just do the arithmetic exactly. For this, consider the M. Newman approach of solving a system of linear equations by multiplying through by suitable powers of 10 to convert all the data to integers and then solving in exact machine single precision arithmetic in the field of integers modulo a prime for a sufficiently large set of prime numbers and constructing the multiple precision numerators and denominators via the Chinese remainder theorem. Besides, in practice, the floating point issue is usually not very serious.
What you meant about Lagrangians is not so clear, but, sure, there is some nonlinear duality theory where the dual of a nonlinear minimization problem is a maximization of a concave function which, then, can be approximated by supporting hyperplanes found from efforts with the primal problem. The proof is short and easy. To write out the proof, I'd want TeX but don't have that on HN.
Uses of this result are commonly called 'Lagrangian relaxation'. Once I had a 0-1 integer linear program with 40,000 constraints and 600,000 variables and got a feasible solution within 0.025% of optimality in 905 seconds on a 90 MHz computer from 500 iterations of Lagrangian relaxation. To maximize the approximations to the concave function, I used the old Watcom Fortran version of the IBM Optimization Subroutine Library (OSL).
This work with Lagrangian relaxation is a multidimensional generalization of H. Everett's single dimensional 'generalized Lagrange multipliers' that he used on DoD resource allocation problems at his Lambda Corporation (after he did his 'many worlds' version of quantum mechanics).
Again, such an example doesn't add or detract from my basic point that we have no good way to take an algorithm by itself seriously and, thus, for an algorithm to be taken seriously need some logically prior work we do take seriously, which is forced to be math with theorems and proofs, which the algorithm merely implements.
Again, find another way to take an algorithm seriously and I will jump for joy. So, in particular, as long as the CS people pursue algorithms without careful, powerful, logically prior work with math theorems and proofs, they will be stuck-o for progress. Sorry 'bout that.