| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pcmonk 1945 days ago
	I think your n_subjects is too low. You need that to be high enough or you'll miss those low-probability winners that bring up the average.

1 comments

nonameiguess 1945 days ago

I did it with more subjects and it doesn't make a difference. The only reason I reduced to 100 is because the plot is unreadable otherwise.

Looking at the Julia code, I think what he is doing wrong is making all wins worth $.50 and all losses worth $.40, but the bet computes a win or loss based on your current wealth, not your starting wealth. His formula would work if you were always betting $1 no matter what your bankroll was, but that isn't what the actual post stipulates.

link

jackcosgrove 1945 days ago

So if you get lucky you de-risk and make small bets comparable to your initial winning bet, rather than betting it all.

Sounds like a good strategy whether in Vegas or Wall Street.

link

maximilianroos 1945 days ago

You're so fixed to your conclusion that you're now reading code wrong.

If you don't trust the Julia code, try running with the same parameters in Python.

link

nonameiguess 1945 days ago

I don't understand why you think changing the number of participants changes the ensemble average. I just ran it with 1,000,000 participants and 1,000 trials and ended up with an ensemble average of $0.07 on the 1,000th trial, trending toward 0. The only difference is the simulation took longer. The curve looks exactly the same. The ensemble average trends upward until about 500 trials, then trends downward and keeps doing so forever.

My code is right up there and you can run it. You can even just run the OP's notebook that he provided but increase the number of trials. Change the "num_flips_per_sim" parameter he provides in cell 6 to anything over 500 and you will always get sum(count_lose_capital) == everyone.

link

maximilianroos 1945 days ago

Take the outside view here — 3-4 people have commented, all disagreeing with you. One of them has offered an explanation of why you're experiment is poorly designed, and I've given you code which produces a different result.

The appropriate response to that is introspection, not repetition.

link

nonameiguess 1944 days ago

Look at my other reply, which was above but is now below. The number of participants required to be likely to find any who stay above water gets very high eventually, much higher than 1,000,000. This can just be calculated.

After 500 trials, you need 279 heads to stay above $1 net wealth. 1.5^278 + 0.6^222 = 0.50 and 1.5^279 + 0.6^221 = 1.26, so that's your breakeven point. The probability of getting at least 279 heads in 500 coin flips is 0.005364, so with 1,000 participants, you expect to see about 5 still above water.

At 1000 trials, the breakeven point becomes 558 and the probability of getting at least that many heads in 1000 flips is 0.00013614. So the expected number of people who stay above water in a pool of 1000 participants is 0. Out of 1,000,000, it is 13, so you're right, there are some, but at that point it's not nearly enough and we're not sampling the ones whose wealth is enough to actually bring the mean back up, so it keeps trending to 0 in any sample of a practical trial size.

This is a pretty interesting property of this problem, really. It's not related to ergodicity, but just the relative proportion of probability mass represented by above 1 and below one itself trending asymptotically toward 0 even though the analytical expectation trends toward infinity. I don't know that there is even a word for that, but seemingly which of those moves faster toward its limit would determine what sample ensemble average you really see when the number of realized states is far less than the number of possible states.

This probably has some implications for Pascal's Mugger type problems in decision theory. If some course of action has potentially infinite future payoff and destroys expected utility calculations because of that, but the expected number of possible universes in which a positive outcome happens at all trends toward 0 faster than the expectation trends toward infinity, that gives a decision rule. In this specific case, don't take this bet, at least not in an indefinitely repeating form.

link

maximilianroos 1944 days ago

Thanks for the thoughtful reply. The breakeven analysis is good!

link

concreteblock 1944 days ago

No matter how many test subjects you use, if you run the experiment for a very long time, everyone goes bankrupt and will never recover.

More precisely there is a finite time after which no-one ever passes above $0.0000000000000001.

That is a mathematical theorem.

This doesn’t depend on the number of test subjects, and you can add as many zeroes as you want.

Therefore in the long run the mean outcome is 0.

Forgive me if I have misinterpreted what you are are trying to say.

Edit: I’ve just realized that I have indeed missed your point.

link

kgwgk 1944 days ago

No matter for how long you run the experiment if you use enough subjects some of them will win an absurdly large amount of money and the sample mean will converge to the mean of the distribution (which grows exponentially with time).

It’s a mathematical theorem. (I would be curious to see a proof of your theorem, by the way.)

link

mrow84 1944 days ago

The number of possible outcomes grows exponentially with time, and so does the ensemble size required to capture the extremal behaviour. Repeated losses bring you closer to zero, which is relatively well sampled by many realisations, but repeated wins produce exponentially larger returns, and so missing out on these realisations catastrophically affects the ensemble average.

A shorter run (say 100 steps) would be more likely to capture enough realisations to produce a reasonable estimate. You could assess this behaviour yourself, for very low step numbers, by calculating the variability in a sampled ensemble average, relative to the exhaustive (i.e. true) ensemble average.

This particular problem is another consequence of the properties dynamical system being examined, but not quite the same as the issues caused by its non-ergodicity.

link

mrow84 1944 days ago

I was interested in seeing the results myself, so here is some python:

    import numpy as np
    import itertools
    from matplotlib import pyplot as plt

    def ensemble_mean(outcomes):
        # Assume we are given a (K, T) array of outcomes, and compute the ensemble average
        # for T+1 time steps, starting with 1 wealth.
        K, T = outcomes.shape
        X = np.ones((K, T+1), dtype=np.float64)
        X[:, 1:] = np.where(outcomes, 1.5, 0.6)
        Z = np.cumprod(X, axis=1)
        return Z.mean(axis=0)

    time_steps = 20

    all_outcomes = np.array(list(itertools.product([0, 1], repeat=time_steps-1)))
    exhaustive_mean = ensemble_mean(all_outcomes)

    ensemble_size = 100
    ensemble_samples = 10000

    ensemble_means = np.zeros((time_steps, ensemble_samples))
    for i in range(ensemble_samples):
        print(i)
        # generate ensembles as though we were sampling (i.e. with replacement)
        J = np.random.choice(all_outcomes.shape[0], size=ensemble_size, replace=True)
        ensemble_means[:, i] = ensemble_mean(all_outcomes[J, :])

    plt.hist(ensemble_means[-1], bins=1000, histtype='step')
    plt.axvline(exhaustive_mean[-1])
    plt.title("Modal sampled ensemble mean is below true ensemble mean")
    plt.show()

link