Hacker News new | ask | show | jobs
by ignoramous 2286 days ago
For someone unfamiliar with machine learning literature, can you please briefly explain how it helps here?
2 comments

in a world where you have many options and have to figure out which is best by repeated experimentation, but where experimentation itself has some cost, you have a multi-armed bandit problem. (the name is supposed to evoke a room full of slot machines -- you want to find the one with the highest payouts by repeatedly playing them, while losing as little money as possible before you find it.)

for example, if you have a few medications, you might start by trying them all equally at random and then as data comes in, use a bandit algorithm to gradually shift more and more new patients onto the ones that prove most effective, in a way that optimally trades off accurately estimating the effects with wasting time testing the less effective drugs.

interestingly, the first formulation of the problem is due to Dr. Thompson at the Yale Pathology Department in the 1930s; he came up with Thompson sampling. So these are techniques that were originally designed for medical trials.

I think that designers of medical trials probably do have a good grasp of this stuff (some statistical estimators that originated in the medical world have even been successfully imported into reinforcement learning/MAB research) so probably they would be using a bandit-like technique if they felt it made sense.

Coordination is complicated. Bandit problems also assume clear/instant payout per levee pull. I suppose WHO could do this in the back end.
Every patient would be treated with random drug/treatment. With accumulated treatment results, a multi armed bandit algorithm would adjust probabilities so, that most effective treatment would be used more often.

For example, in Thompson sampling probability of choosing option is equal to probability of that option being the best option given evidence so far.

Aim is to maximize reward (successful treatments), while spending little as possible time on exploration (testing less effective treatments).

Delay is a real problem here.