Hacker News new | ask | show | jobs
by Faint 2286 days ago
Every patient would be treated with random drug/treatment. With accumulated treatment results, a multi armed bandit algorithm would adjust probabilities so, that most effective treatment would be used more often.

For example, in Thompson sampling probability of choosing option is equal to probability of that option being the best option given evidence so far.

Aim is to maximize reward (successful treatments), while spending little as possible time on exploration (testing less effective treatments).

1 comments

Delay is a real problem here.