|
|
|
|
|
by jerf
2286 days ago
|
|
"Multi armed bandit methods work best with immediate success-fail metrics. This one has time delays." Well, sure, but everything works best with immediate success-fail metrics. That's one of the most basic results from learning theory is that the longer the latency between stimulus and response the slower the learning rate can be. I'm not sure how multi-armed bandit is special in this regard in any particular dimension. All learning techniques are going to be susceptible to the problem you outline in your second paragraph. This is one of those "there is no perfect solution" situations. It's really easy to say that out loud. It's quite difficult to internalize it. (Also, just as a note to your other post, bear in mind that our hard-core "social distancing" efforts in the US are just about to reach approx. 1 incubation period. It is only just this week that we're going to start seeing the results of that, and it'll phase in as slowly as our efforts 1-2 weeks ago did. My state just went to full lockdown today, though we've been on a looser lockdown for a week before that.) |
|
Which medicine looks effective? Which medicine gets people out of the hospital faster? What underlying conditions interacted badly with given medicines? These questions do not have to be asked up front. But they can be answered afterwards. And knowing the answers, matters.
Here is an example. Suppose that we find one medication that gets people out of bed faster but kills some. In areas with overwhelmed hospitals, cycling people through the bed may save net lives. If your hospital is not overwhelmed, you wouldn't want to give that medicine. Now I'm not saying that any of these medicines will come to a conclusion like that. But they could. And if one did, I definitely want human judgement to be applied about when to use it