Integer programming often seems magical. I remember early grad school and seeing "Optimal and Near-optimal Global Register Allocation Using 0–1 Integer Programming" by Goodwin et al, where the very crunchy problem of register allocation was solved largely by punting it to a ILP solver. Gotchas abounded but it was eye opening to see how many hard problems could be solved (or at least adequately approximated) in this fashion.
I think this is one of those techniques that every serious computer scientist should have in their toolbox. Someone who knew some statistics, a smattering of something like SMT (enough to drive Z3) and some linear programming could routinely resemble Gandalf at any shop that has people struggling with difficult problems.
The main thing is not to turn into a cookbook guy. I've known a few people whose ability to just build an actual algorithm to solve some interesting problem has atrophied since they reach for a ILP solver the minute anything gets difficult.
Agreed that it seems magical. This is one of the areas where the open-source tools are way behind commercial solvers. Cplex and Gurobi are really impressive pieces of software.
Doubly magical now that computers (and solvers) are so fast. A few times I've thought, "Oh, I could reduce this to a max-flow problem, and write/dig up an algorithm to solve it" before realising it'd just be easier to write it as a linear program (or an integer program "just in case".)
And then if weird constraints come along that broke the max-flow reduction, I could usually shoehorn that into the formulation.
Can't remember how to write Dijkstra's algorithm? Meh, integer programming it is. Want to write a sudoku solver and you can't be arsed with dancing links or backtracking or prolog or... Whatever, CBC will do it. It'll be nasty and a hell of a lot slower than doing it "the right way" in a fast language, but it might not be slower than doing it in Python or Javascript.
The latter is what scares me. I solved a 'bit twiddling' problem using Z3 and felt a little ashamed. It is reminiscent of Brewster's letter to Martin Gardner "Feeding a recreational puzzle into a computer is no more than a step above dynamiting a trout stream". It's very easy to let skills atrophy and resort to using black boxes we don't really understand (well, I speak for myself, anyhow). When Z3 or cplex or whatever tool du jour gets wedged on a problem, I am not sure I have the background in these tools to get unstuck.
With Z3, unfortunately, the issue is that we are really bad at gauging what makes for a hard SMT problem. Often, knowing that a particular problem is hard is the same as solving it! That said, you can do some really interesting things with SMT solvers if you can formulate the problem the right way.
Gurobi, CPLEX, and friends are less magical if you have enough mathematical maturity. Typically, if you are comfortable with linear algebra (a subject woefully ignored in most undergraduate curricula), you can work with them. In my experience, working with optimization engines is a two step process. First, you need to figure out how to formulate the problem in a smart way (typically this means that it's convex with constraints in a certain format), which is normally the hard part. Then you scroll through the list of available algorithms that the engine has at your disposal and pick one with properties that you like (they often have suggestions if you have no strong preference). Normally, if you can formulate the problem so it's convex, you will at least have a good shot of doing well with your optimization.
I have a Phd in optimization and I still think Cplex and Gurobi seem magical. The devil is in the details. Open source solvers are comparatively not very good.
I wrote a sudoku solver in my first year of college using some heuristsics and some sort of branch-and-cut algorithm, in java. It used bit masks to keep track of the state and possiblities, I was pretty proud, spent like weeks on that.
Recently I re-implemented in Pulp in an afternoon using ILP.
It's similarly fast, both can solve similar sets of problems. But the ILP solution was so much easier and shorter.
I mean branch-and-bound - i.e. just reject when solution becomes infeasible and back-track. No cutting planes in first year college. ;)
I think the heuristics may have been based on selecting cells that had the least possible solutions, in order to quickly 'zoom in' on the areas where you can quickly prune infeasible solutions.
Whats the distinction between Integer Programming and Constraint Programming? From a cutlery glance they appear to be the same solution to the same problem...
Integer programming is effectively searching the edge of a convex polygon (polytope for higher dimensions) for an optimal value function, defined in terms of the coordinates of the points in space. The inequalities are planes that subdivide the solution space into permitted and non-permitted domains.
Constraint programming generates lots of potential solutions (combinatorially) and prunes the search tree to make the large numbers tractable.
The intuitions behind the two techniques are quite different.
> Integer programming is effectively searching the edge of a convex polygon...
Great definition of linear programming. In a sense the thing that makes integer programming hard is that the feasible region is not convex -- for two feasible solutions x and y, ax + (1-a)y for 0<=a<=1 is not necessarily feasible.
I'd also say that integer programming is a kind of constraint-based programming extended with the addition of an objective function -- we're not just looking for any satisfactory answer, but an answer with a cost or value that cannot be improved upon. You're definitely right that "things called constraint-based programs" tend to be solved in different ways, though (and the languages they're expressed in tend to be nicer, too.)
I think your definition of Integer programming is not correct. Integer programming or Mixed Integer Programming assumes that the solution has the all or some variables, integers. Most of the Integer programming models are usually using binary variable to indicate decisions.
Further, I think you tried to describe Linear programming in the beginning of your post. But Linear programming searches the vertices of the polytope and not the edges.
The biggest issue with Integer programming is the lack of convexity of the solution space when compared to Linear programming. So the approach for solving them is very different compared to Linear programming. If I made any mistakes in my post, I welcome to be corrected.
Constraint programming can be viewed as looking for feasible solutions to optimization problems.
So, what is optimization? For positive integers m and n, the real numbers R, and functions
f: R^n --> R
g: R^n --> R^m
find x in R^n to (problem MP1)
maximize f(x)
subject to
g(x) <= 0
where g(x) is regarded as an m x 1 matrix
and the 0 is also m x 1 but with all its components 0.
The constraints are the
g(x) <= 0
We say that we have m constraints.
The subject is essentially the same
using
g(x) >= 0
or
g(x) = 0
instead or minimizing instead of
maximizing.
Point x in R^n is feasible provided
it satisfies the constraints, that is,
g(x) <= 0
The feasible region is the set
F of all feasible x.
A point x is optimal if it is
feasible and makes f(x) as large
as possible. That is, x is
optimal provided for any feasible
y we have f(x) >= f(y).
If one or more of the functions
f or g is non-linear, then we
have non-linear programming.
Here we may use the Kuhn-Tucker
conditions to seek a point
that satisfies necessary conditions
for optimality. For an algorithm,
we may take gradients of the
constraints to set up a linear
program, solve that, do a line search,
and try again.
If functions f and g are linear, easily written using matrix algebra, then the optimization problem is linear programming (here programming is used in the British sense of operational planning).
Function f is convex provided for
any x, y in R^n and any t in [0, 1]
we have
t f(x) + (1 - t) f(y) >= f( t x + (1 - t) y )
That is, in a graph of f, a straight line
between (x, f(x) and (y, f(y) ) is on or
over the graph and never under the
graph. So, the straight line over
approximates the function.
If f is convex, then it is continuous (proof is not very difficult but is in Fleming, Functions of Several Variables). IIRC convex implies differentiable almost everywhere (maybe proof in R. T. Rockafellar, Convex Analysis). The epigraph is the graph of f and everything above it, and it is a convex set so has supporting hyperplanes and is the intersection of all the closed half spaces from its supporting (tangent) hyperplanes (IIRC, proof in Fleming). Such a
supporting hyperplane is called a
subgradient and can be quite useful
in optimization, e.g., Lagrangian
relaxation. There is a closely related,
really nice result in Hilbert space: Every
non-empty, closed convex subset has a unique
element of minimum norm. That can
be very nice to know!
To be more clear, a convex function, while
continuous, need not be differentiable.
Why not? Intuitively, at a point, the
function makes a sharp turn. Similarly, a cube is a convex set but has 10 sharp
edges and 8 sharp points that have supporting (tangent) planes, but these planes are not unique. So, a subgradient at a point would be a gradient and a
tangent plane if at that point it was
the only tangent plane. So, intuitively,
a subgradient is nearly a tangent
plane.
Function f is concave provided
that -f is convex. If f is convex
and -f is convex, then f is linear --
so, intuitively, a function that
is convex (or concave)
is half of being linear.
We have some astoundingly
powerful methods for solving
linear programming problems.
A significant fraction of the
Nobel prizes in economics
have been from economic
models based on linear
programming.
Some special cases, e.g.,
least cost network flows,
maximum matchings,
are special cases with
even better results for the
algorithms.
If in MP1 function f is concave and
funtion g is
linear, then again we have some
good methods.
If in MP1 function f is quadratic
and concave and function g is
linear, then we have the problem
of Markowitz in investment
portfolio management and his
Nobel prize in economics.
If we are maximizing a
convex function subject to
linear constraints, then we
have, IIRC, a problem in NP-complete.
If in linear programming
we require in addition that some or all of the components
of x are required to be integers,
then we have some integer
constraints and a problem in
integer linear programming (ILP).
ILP was one of the earliest
problems shown to be in
NP-complete.
So, that's a nutshell version of optimization.
Can also extend to accommodating randomness and
doing much the same over time.
The over time part is control theory,
e.g., how to fly an airplane from one point
to another in least time -- there are books, e.g., by Athans and Falb. Mike Athans was long at MIT and Falb, at the Brown Division of Applied Mathematics. Control theory
had a flurry of interest in the space
program. There is an extension, differential game theory, which
is like how best for a missile to
chase an airplane where the airplane
is best trying to avoid the missile.
Typically in control theory and
differential game theory we look
for necessary conditions for
optimality which usually turn out
to be in terms of Lagrange
multipliers.
Optimization over
time with randomness is essentially
stochastic optimal control, and the
main technique is
a version of dynamic programming.
An early proponent was Richard
Bellman.
The Black-Scholes Nobel prize work in
option pricing is a relatively
simple case of stochastic optimal
control.
So, in terms of MP1, constraint programming is, say,
find x in R^n so that
g(x) <= 0
that is, just find a point in the feasible
region R.
An example might be, yesterday
killed 5000 hogs and hung and chilled
the results. Today cut the hogs
and put the pieces in boxes
in 18 wheel trucks, about 40,000
pounds of load per truck, for
delivery to customers about 900
miles away.
Have only three bays in the loading
dock. So, want to know the order
in which to load the trucks to
get them all loaded in time.
So, have a constraint satisfacation
problem and not really an optimization
problem.
Fundamentally, i.e., in the sense
of computational complexity,
finding a feasible
point is as difficult as finding
an optimal point.
So, really, fundamentally,
constraint programming
is no easier than optimization.
IIRC at one time SAP and ILOG
were interested in CPLEX as
means of solving constraint
satisfaction problems. So,
to solve a constraint satisfaction
problem, start with some
optimization software.
I strongly encourage using Julia for anyone applying mathematical optimization. It is (through JuMP[1], for example) one of the areas in which it really stands out.
What the OP is describing is, except for the coding, some now classic applied math, i.e., operations research. That applied math got used much less often than one might have expected because the (A) data gathering was too much pain, expense, and botheration, (B) there was too much software to write, and writing it was too clumsy and expensive, (C) the computing was too slow and too expensive, and (D) even when got a good solution ready for production, the production situation commonly changed so fast that the work of (A)-(C) could not change and revise fast enough to keep up. And, of course, the custom, one-shot software was vulnerable to bugs. Net, in practice, a lot of projects failed. Mostly successful projects needed big bucks and lots of unusually insightful sponsorship high in a big organization.
But, now (A)-(D) are no longer so difficult. This should be the beginning of a new Golden Age for such work.
Sure, since the results of such optimization can look darned smart, smarter than the average human, might call the work artificial intelligence. Really, though, the work is mostly just some classic applied math now enabled in practice by the progress in computer hard/software.
The OP is
a nice introduction to vehicle routing via integer linear programming (ILP) set partitioning. For the linear programming and the case of ILP, I'll give a quick view below. But, now, let's just dig in:
Here is an explanation of the secret approach, technique, trick that can work for vehicle routing and many other problems: The real problems can have some just awful non-linear cost functions, just absurdly tricky constraints, e.g., from labor contract work rules, equipment maintenance schedules, something can't do near point A near lunch, even handle some random things, even responding to them in real-time (dynamically), etc. yet can still have a good shot at getting a least cost solution or nearly so. The "nearly so" part can mean save a lot of money not available otherwise. When there is randomness, then try to get least expected cost.
So, first, the trick is to do the work in two steps.
The first step call evaluation and the second, optimization.
From 50,000 feet up, all the tricky, non-linear, goofy stuff gets handled by essentially enumeration in the first part leaving some relatively simple data
for the optimization in the second part.
In practice, this first
step typically needs a lot
of data on, say, the streets of
a city and requires writing some software
unique to the specific problem.
The second step, the optimization, may require deriving some math and/or writing some unique software, but
the hope is that the step can
be done just by routine application
of some existing optimization software.
The OP mentions the now famous optimization software Gurobi from R. Bixby and maybe some people from Georgia Tech, e.g., from George Nemhauser and Ellis Johnson (long at IBM Research and behind IBM's Optimization Subroutine Library (OSL) and its application to crew scheduling at American Airlines).
First Step.
Suppose you are in Chicago and have 20,000 packages to deliver and 300 trucks. Okay, what trucks deliver what packages to make all the deliveries on time, not overload any trucks, and minimize the cost of driving the trucks? You do have
for each package the GPS coordinates and street address. And you have a lot of data on the streets, where and when traffic is heavy during the day, etc.
Okay, let's make some obvious,
likely doable progress: Of those 20,000 packages, maybe have only 15,000 unique addresses. So, for each address, bundle all the packages that go to that address. Then regard the problem as visiting 15,000
addresses instead of delivering
20,000 packages.
So, you write some software to enumerate. The enumeration
results in a collection of candidate routes, stops, and packages to be delivered for a single truck. For each of those candidates, you adjust the order in which the stops are made to minimize cost -- so here get some early, first-cut, simple optimization. You keep only those candidates that get the packages delivered on time, meet other criteria, etc. You may have
1 million candidate single
truck routes. For each of the the candidates, you find the (expected) operating cost.
So, suppose you have n = 1 million candidate single truck routes.
Also you have m = 15,000 addresses to visit.
So, you have a table with
m = 15,000 rows and n = 1 million columns. Each column is for some one candidate route. Each row is for some one address. In each column there is a 1 in the
row of each address that candidate
route visits and a 0 otherwise. One more row at the top of the table is, for each column, the operating cost of that candidate route.
So, you have a table
of 0's and 1's with
m = 15,000 rows
and n = 1 million
columns. You have a row
with 1 million costs, one
cost for each column.
Again, you have 300 trucks.
So, you want to pick, from
the n columns, some
<= 300 columns so that
all the m addresses get served and the total costs of the
columns selected is minimized.
That is, if add the columns as column vectors, then
get all 1's.
Second Step.
Well, consider variables x_i for i = 1 to n = 1 million. Then we want x_i = 1 if we use the route in column i and 0 otherwise. Let the cost of the route in column i be c_i. We want the total cost (TeX notation):
z(x) = sum_{i = 1}^n x_i c_i
to be minimized. So, right, we take the big table of m = 15,000 rows and n = 1 million columns and call it m x n matrix A = [a_{ij}]. We let m x 1 column vector b have all 1's. We regard x as n x 1 where in row j = 1 to n is x_j.
Then, we get linear program
minimize z(x)
subject to
Ax = b
x >= 0
So, this is a case of linear
programming.
Except in our problem we have one more
constraint -- each x_i is
0 or 1, and in this case
our problem is 0-1 integer
linear programming.
Linear Programming.
In linear programming with n
variables, with the real
numbers R, we go into
the n-dimensional
vector space R^n.
The
Ax = b
x >= 0
are the constraints, and
the set of all x that satisfies
those is the feasible region F,
a subset of Rn.
In R^n, a closed half space*
is a plane and everything
on some one side of it.
Then F can be regarded as
an intersection of m closed
half spaces. So, F has flat
sides, straight edges, and
some sharp points (extreme
points).
Well,
if there is an optimal
solution, then there is
an optimal solution at at
least one
of those extreme points.
So, the famous Dantzig
simplex algorithm
looks for optimal solutions
in iterations were
each iteration starts
at an extreme point, moves
along an edge, and stops
at the next extreme point.
That's for the geometric
view; the algebraic view
is a tweak of the standard
Gauss elimination algorithm.
Linear programming and the
simplex and other algorithms have
a huge collection of nice
properties, including
some surprisingly good
performance both in
practice and in theory.
But, asking for each x_i
to be an integer is
in principle and usually
in practice a
huge
difference and gives us
a problem in NP-complete --
at one time, this was a huge,
bitter surprise.
Warning: At one time, the field of the applied math of optimization in operations research sometimes
had an attitude, that is, placed
a quasi-religious importance
on optimal solutions
and was contemptuous of
any solutions even 10 cents short
of optimal. Well, that attitude
was costly for all concerned.
Instead of all the concentration
on saving the last 10 cents,
consider saving the first $1 million. Commonly in practice,
we can get close to optimality,
may be able to show that
we are within 1% of optimality,
so close the rest wouldn't even
buy a nice dinner, and see no
way in less than two weeks more of
computer time to try to
get an optimal solution.
So, concentrate on the big, fat doughnut, not the hole.
How to solve ILP problems is
a huge subject -- e.g., can start with George Nemhauser --
but a major fraction of the
techniques exploit some
surprisingly nice properties
of the simplex algorithm.
Right, likely the best
known approach is the
tree search technique of
branch and bound.
I took a number of Operations Research courses in undergrad, and one course was almost entirely on Linear Programming and Integer Programming. I strongly recommend engineers to get familiar with them because they are powerful tools for solving a whole class of really tricky problems.
I think this is one of those techniques that every serious computer scientist should have in their toolbox. Someone who knew some statistics, a smattering of something like SMT (enough to drive Z3) and some linear programming could routinely resemble Gandalf at any shop that has people struggling with difficult problems.
The main thing is not to turn into a cookbook guy. I've known a few people whose ability to just build an actual algorithm to solve some interesting problem has atrophied since they reach for a ILP solver the minute anything gets difficult.