|
|
|
|
|
by Faint
2286 days ago
|
|
Every patient would be treated with random drug/treatment. With accumulated treatment results, a multi armed bandit algorithm would adjust probabilities so, that most effective treatment would be used more often. For example, in Thompson sampling probability of choosing option is equal to probability of that option being the best option given evidence so far. Aim is to maximize reward (successful treatments), while spending little as possible time on exploration (testing less effective treatments). |
|