|
|
|
|
|
by megrimlock
4205 days ago
|
|
Ok, I'm stumped. If I've parsed your description correctly, we get no info about our opponent's actions or the results until the end. Absent any ability to observe their strategy, it seems like you do want to maximize for expected value of your own actions, and I'm curious about the counterexample. How do we maximize EV? A single throw's pdf is 1, for x in [0,1], so its EV is 0.5. The question is how to improve on a single throw by deciding to re-throw. A re-throw is independent and gives the same EV. We want a strategy that gives us higher cumulative EV. Say our strategy is that we have a threshold A, where we re-throw any result below A. Because x is uniform, the probability that we re-throw is also A. The cumulative EV of the strategy is A * EV(second_throw) + (1-A) * EV(keep_first_throw). Since we only keep the first throw for results in [A,1], the EV for that event (integrating x * pdf from A to 1) is (1+A)/2. So EV of the whole strategy is A/2 + (1-A) * (1+A)/2. It has max EV when A is 0.5, giving EV of 5/8. So how do you do better? |
|