|
You don't know what A is even if he tells you the first envelope is 60. A and 2A exist together counterfactually, if you have A, you have 2A, if you have 2A, you have A. This is a fundamental property of imperfect information games. You don't have State=A, ever, and you are lying to yourself when you pretend that you do. It is /why/ the reasoning error happens. State based reasoning only works in perfect information games because in that framing you don't have counterfactual subgames forcing them to depend on each other - forcing you to play them both simultaneously, because you don't know which subgame you are in. A and 2A are together, so you can't propose A/2 without violating the A and 2A codependency imposed by not knowing which subgame you are in. There are multiple A when you get told 60. They are 30, 60, and 120. This leads to three potential solutions to the expected value: 3/2(30), 3/2(60), 3/2(120). You can't tell which of these solutions you are in, because you don't know. Regardless, since the terminating R(KEEP) condition is the only one that is defined P(SWITCH) = 1 still is either undetermined or 0 depending on how you solve the bellman equations. Your policy decides your expected value. So earlier I was a little loose when I claimed the 3/2A relationship. Honestly, the wikipedia article is really terrible. It demands you stick in its formalism and declares you a no true scotssman if you don't, buts its approach is fundamentally wrong. Throw off its chains and consider the framing where you remove the requirement of thinking about imperfect information. People are bad at it and if this is confusing you then it is /because/ you are struggling with the imperfect information. The way you convert between the two game types is this: Instead of states -> information sets. So you only have one move in this game. You always have the Null information set. You always have the same situation. So you're only allowed to have one choice and that is it every time forever. So you always have to make the same decision. Secondly instead of actions -> probability vectors over actions. To simplify and make it tractable assume [0.0, 0.1, 0.2, ... 1.0] so the action space is small. You are choosing 'an action' to take and its going to create multiple subgames. Focus on the recurrence relationship between those games. In particular look at the base case: R(KEEP) = {0.5: A, 0.5: 2A}. Everything heads toward that base case except one thing: P(SWITCH)=1. It is a markov process and R(SWITCH) 'drains' so as to be R(KEEP) because any probability in it inevitably becomes R(KEEP) after enough iterations. Notice that {0.5: A, 0.5: 2A} is always true! It is true for both you have A and you keep it and it is true for both you have 2A and you keep it, because notice - you never ever have A. You have the null information set. You can't ever see a difference between these two things. The moment you force in the idea that you can actually define A to be a particular thing, you smash all over that recurrence relationship. You destroy the relationship. You claim there is no relationship between the policy function and the expected value even though /there is/. There is so much wrong with replacing R(SWITCH) with a fake reality where you can you know you're actually in 1/2A when you can't. And that is what the equations are doing. They're saying you can replace the dependence on each other with a subgame that exists in a different reality. Lets say 120 was when you got 60 - there is no 30 in existence, but your equation calls to replace the recurrence relationship with the idea there is one. It doesn't make sense. |
A=60 is the amount in our chosen envelope - we are given this information in the variant in this thread. What we still don't know is if that is the larger amount (and thus the other envelope contains 30, and therefore x=30), or the smaller amount (and the other envelope contains 120, and therefore x=60).
The error is (in step 7) to calculate the arithmetic expectation of those absolute values, because they do not exist "together". The correct arithmetic mean can be obtained by considering the different conditions in which those values do exist, as described in [0]. However, the ratios do exist "together" - the other envelope contains either double or half of 60 - so we could instead calculate the geometric mean, of either the ratios or the corresponding absolute values, and obtain the correct result:
[0] https://en.wikipedia.org/wiki/Two_envelopes_problem#Other_si...edit: changed “first envelope” to “chosen envelope” for clarity.