Hacker News new | ask | show | jobs
by jdleesmiller 1754 days ago
(Author of the article here.)

Thanks, I can vouch for the Bertsekas book. I'd also recommend:

Russell, Stuart, and Peter Norvig. "Artificial intelligence: a modern approach." (2002).

Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018. (As another commenter just pointed out.) They have a great example with a recycling robot.

3 comments

If you buy "Artificial intelligence: a modern approach," I suggest coughing up the extra money for the latest version. It's several times more expensive than older copies but they really updated the content.
I'd add, R. T. Rockafellar has an interesting technique called scenario aggregation.
I might have mentioned that for your example, we can have good confidence in the Markov assumption -- we just say we need to trust the random number generator.

In real world practice, i.e., where we don't get to drive the stochastic part of the problem with our own random number generator, the Markov assumption (past and future conditionally independent given the present) can be, commonly is, difficult to justify.

But, then in just the theory, every stochastic process is Markov -- just let the state be the history -- I intend this as a joke, but actually, technically in the math, it is true.

If we DO make the state the history and have a problem in continuous time, then, at least in theory, we are conditioning on uncountably infinitely many random variables, and just defining how to do that is a bit interesting.

Then for our decisions. policy, we can encounter the problem of measurable selection, also a bit interesting and surprising.

I.e., dynamic programming in continuous time gets us into measure theory, the Radon-Nikodym theorem, etc.

Stochastic dynamic programming, when find a good application and can build a good solution, i.e., keep the computational demands within the super computer range, can seem really smart, of course is not prescient but first cut, intuitively, can seem to be. Your solution likely also seems prescient.

More applications would be nice.

Here is a potential, potentially large, collection of new applications: Develop a spreadsheet, for an example, for, say, budgeting for the next 5 years. To have a simple description, have one column for each month, for a initial column and then one more for the end of each of the 60 months, and have one row for each variable. Some of the spreadsheet cells might have random quantities (have them all independent, to keep the Markov assumption easy to justify), and some of the cells might be empty and available for decisions.

Now, presto, bingo, whether we knew or intended it or not, we now have formulated a problem in discrete time stochastic dynamic programming, and since in practice problem formulation is a common obstacle, we have maybe made some progress.

In principle at this point the software for a solution is well defined. And since long ago people wrote general purpose stochastic dynamic programming software, the software we would need is in a sense actually simple (if we don't care about execution time).

Now just feed this problem into a suitably large computer and get the resulting decisions and the expected earnings, we wanted to maximize, at the end of the 5 years.

If make the software more complicated, then can put in some ideas to make the computations faster by some factors of 10.