| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by JoshCole 1486 days ago

You don't know what A is even if he tells you the first envelope is 60. A and 2A exist together counterfactually, if you have A, you have 2A, if you have 2A, you have A. This is a fundamental property of imperfect information games. You don't have State=A, ever, and you are lying to yourself when you pretend that you do. It is /why/ the reasoning error happens. State based reasoning only works in perfect information games because in that framing you don't have counterfactual subgames forcing them to depend on each other - forcing you to play them both simultaneously, because you don't know which subgame you are in.

A and 2A are together, so you can't propose A/2 without violating the A and 2A codependency imposed by not knowing which subgame you are in.

There are multiple A when you get told 60. They are 30, 60, and 120. This leads to three potential solutions to the expected value: 3/2(30), 3/2(60), 3/2(120). You can't tell which of these solutions you are in, because you don't know. Regardless, since the terminating R(KEEP) condition is the only one that is defined P(SWITCH) = 1 still is either undetermined or 0 depending on how you solve the bellman equations. Your policy decides your expected value. So earlier I was a little loose when I claimed the 3/2A relationship.

Honestly, the wikipedia article is really terrible. It demands you stick in its formalism and declares you a no true scotssman if you don't, buts its approach is fundamentally wrong. Throw off its chains and consider the framing where you remove the requirement of thinking about imperfect information. People are bad at it and if this is confusing you then it is /because/ you are struggling with the imperfect information.

The way you convert between the two game types is this: Instead of states -> information sets. So you only have one move in this game. You always have the Null information set. You always have the same situation. So you're only allowed to have one choice and that is it every time forever. So you always have to make the same decision. Secondly instead of actions -> probability vectors over actions. To simplify and make it tractable assume [0.0, 0.1, 0.2, ... 1.0] so the action space is small. You are choosing 'an action' to take and its going to create multiple subgames. Focus on the recurrence relationship between those games. In particular look at the base case: R(KEEP) = {0.5: A, 0.5: 2A}.

Everything heads toward that base case except one thing: P(SWITCH)=1. It is a markov process and R(SWITCH) 'drains' so as to be R(KEEP) because any probability in it inevitably becomes R(KEEP) after enough iterations.

Notice that {0.5: A, 0.5: 2A} is always true! It is true for both you have A and you keep it and it is true for both you have 2A and you keep it, because notice - you never ever have A. You have the null information set. You can't ever see a difference between these two things.

The moment you force in the idea that you can actually define A to be a particular thing, you smash all over that recurrence relationship. You destroy the relationship. You claim there is no relationship between the policy function and the expected value even though /there is/. There is so much wrong with replacing R(SWITCH) with a fake reality where you can you know you're actually in 1/2A when you can't. And that is what the equations are doing. They're saying you can replace the dependence on each other with a subgame that exists in a different reality. Lets say 120 was when you got 60 - there is no 30 in existence, but your equation calls to replace the recurrence relationship with the idea there is one. It doesn't make sense.

2 comments

mrow84 1486 days ago

You are conflating the quantity A, which is a label for the (unknown in the original formulation) amount in the chosen envelope, with the "smaller amount" - labelled x in the wikipedia article.

A=60 is the amount in our chosen envelope - we are given this information in the variant in this thread. What we still don't know is if that is the larger amount (and thus the other envelope contains 30, and therefore x=30), or the smaller amount (and the other envelope contains 120, and therefore x=60).

The error is (in step 7) to calculate the arithmetic expectation of those absolute values, because they do not exist "together". The correct arithmetic mean can be obtained by considering the different conditions in which those values do exist, as described in [0]. However, the ratios do exist "together" - the other envelope contains either double or half of 60 - so we could instead calculate the geometric mean, of either the ratios or the corresponding absolute values, and obtain the correct result:

    (2 * 0.5) ** 0.5 = 1
    (120 * 30) ** 0.5 = 60

[0] https://en.wikipedia.org/wiki/Two_envelopes_problem#Other_si...

edit: changed “first envelope” to “chosen envelope” for clarity.

link

JoshCole 1485 days ago

On further reflection, you're right, I'm conflating it.

My correction is still valid though. You're not handling step ten properly. You didn't work over the information sets, didn't solve the actual graph that is the game, didn't handle the under-specified policy function.

To try and show you that your solution isn't the actual solution: well, both options have the same EV. So I choose switch every time, because why not. As you are no doubt aware I never get to have EV because I'm constantly swapping. The sixty is a mirage. For my policy choice, the answer was undefined or zero depending on how you write it down. But you told me they had the same EV. So if they did, why did my choice not produce that EV? Ergo, your solution only appears to be giving you the EV.

Think about that for a while and you'll start to realize why I honed in on specifying a recurrence relationship with the terminal keep node and why I'm so eager to escape the trap of their flawed problem model.

link

JoshCole 1485 days ago

Am I really the one who is getting things conflated? Go back to the problem and look at step ten. The problem allows infinite swapping. You are doing an EV calculation, but your analysis is fundamentally flawed. Assign the policy of always switching. Your analysis is claiming that EVs can be calculated, but they can't. The EV is either zero or undefined for switching, because the recurrence relationship is an infinite sequence. Since 60 != 0 and 60 != undefined, but you claim that they are, something is very wrong with your calculations. You're using the wrong formalism. The policy is supposed to be able to vary, but you're treating EV as a concept that isn't dependent on policy.

Lets take a step back and learn the important lesson for more complex situations. Your policy influences your expected value. Not keeping that in mind is going to destroy the ability to correctly calculate expected value. You aren't trying to search for the best thing to do on the basis of expected value. You are searching for the right policy to provoke a high expected value. The difference is subtle, but essential.

How do we correct it? Well, the right formalisms that allow you to search for the correct policy comes from several fields, but one of them is game theory. In game theory when dealing with imperfect information, it is considered incorrect to do state-based reasoning under imperfect information. This is because you aren't in a state - you are in an information set. When you are playing the game you have to consider every game you could be in, because you don't know which you are in.

This is a second problem with the analysis, but I think you corrected this one.

They ask to be able to translate this into more complex situation. So the general lesson here is about considering counterfactuals in your analysis. An example of this in practice is Jeff Bezo's talking about his decision to found Amazon on the basis of regret minimization on account of his theory about how he would feel in various counterfactual futures. He didn't consider one EV, the founding of Amazon, but also other EVs like founding Amazon and failing and also not founding Amazon and doing other things.

I think I get why you think I'm conflating A, but I'm actually trying to point out that the wikipedia article is conflating A and so its hard to have a productive discussion due to our inheritance of their misuse of terms. I don't want to conflate A, but the Wikipedia article defined A in their expected value calculation and in that equation it ends up taking on a different meaning to what it means when it is defined to be 60. And their meaning ends up claiming things like 1=2 in practice, because of the properties of the hidden counterfactual part of their equations - just because they neglect to show them, doesn't mean they don't exist in the correct mathematics. So the logical contradiction is there - which is exactly the thing the problem asks us to identify.

link

mrow84 1486 days ago

I find your exposition difficult to follow, perhaps you could write out your proposed solution without the accompanying explanatory text, so that we can clearly see how it resolves the paradox.

link

JoshCole 1485 days ago

Okay. lets start with the easy analysis; the version of the game that terminate after only one switch.

I[null] = {0.5: A, 0.5: 2A}

We get the expected change in EV for switching like this:

    (
        # The expected gain of switching if we are in subgame A
        (2A-A) * 0.5
    
        +

        # The expected loss of switching if we are in subgame 2A
        (A-2A) * 0.5
    )

See how we had to consider two different subgames? We didn't know whether we had A or 2A. We only knew we had I[null].

You had to consider the benefit of switching from A to 2A and the cost of switching from 2A to A. You had to consider the factual reality you were in, but also the counterfactual reality you were not in.

Here is the reality of the first part of the game tree:

   ChanceNode(0.5)
  /           \ 
 A            2A

In a perfect information game, A and 2A are disconnected because they are in two different subtrees, but what watch what happens when we convert from the perfect information view to the imperfect information view of that world that we are actually dealing with:

        0.5
      /    \
    I[nil] I[nil]

We have two information sets now as the branches. One is actually A, but when we do reasoning about it we need to counterfactually consider it the other parts of the information set.

                      0.5
      /                                    \
    {Factually A, Counterfactually 2A}     {Factually 2A, Counterfactually A}

So I hope you're starting to realize that A and 2A are fundamentally connected. You can't reason about A without including 2A. You can't reason about 2A without including A. This is really really important to one of several reason that their analysis is wrong. You don't know which branch you are in. So you can't condition on being in A, like they do in the wikipedia article.

Look at the trouble they run into when they /do/ condition on A. I just showed you these counterfactuals exist, but when they condition they still exist. It treats the situation as if you can just deal with A and just deal with 2A. So they get a 2A case and a A/2 case both of which have their own counterfactuals. Notice what just happened when they did that.

In the A/2 case since we are in imperfect information there is a counterfactual associated with it. What is that counterfactual? It is A! So they don't just have A/2 they also have counterfactual A.

And now look at the other case. 2A has a counterfactual associated with it too. What is it? A.

So they have this:

[F: A, CF: 2A], [F: A/2 CF: A]

And what I'm trying to point out to you is that they just declared A=A/2 and 2A=A, because they neglected the counterfactual relationships.

You can't condition on A in the subgame; you don't have perfect information - you don't have A. You have I[null]. Even when you get told 60 you still have I[null].

/A/ isn't 60 and it can't be because you don't know which subgame you are in. A isn't given. If you knew A, if it was possible to know A, you wouldn't be in an imperfect information game. A is defined in point one yes - but it is also used in point seven and when it is used there we get the logical contradiction.

link