Hacker News new | ask | show | jobs
by conjectures 2797 days ago
Good point. But... this particular objection has a flaw (which may be copied by the BBH).

The Bayesian machinery tells you how to update your beliefs given evidence. It doesn't tell you what shape those beliefs should be in the first place.

My theory is that we carry around a deck of personas or sterotypes. We hear of a new person, and some behaviour of theirs. We then predict which persona was likely to generate that behaviour. Conditional on the distribution over personas we answer the questions about that person's predicted behaviour.

In the `Linda` example the theory above suggests we take the background information about all their political commitments at college, which from the description would seem to give strong evidence about their persona.

The common and wrong answer to the question stems from predicting, based on the persona, that the person would continue with their political commitments.

The 'shape being wrong' issue here is that maybe personas are not the right way to structure the problem. But Bayes's theorem doesn't tell you that. That's a whole load of extra machinery that people have additionally developed and should deploy when using Bayes's theorem.

Back to the word problem. An issue is that the options:

- A

- A&B

Both include A. In a world where A always happens the only non-trivial way to read the question is that the first option must implicitly mean (A&!B).

If A is assumed to be true, the question then becomes is B more likely to be true or not true. The background information given is a reasonable explanation for the common answer of people selecting (A&B).

1 comments

Solving epistemology doesn't solve ontology. Bingo.
In more strictly mathematical terms, having a "normative" update rule (Bayes' rule) doesn't tell you what topology of latent variables the generative model "ought" to have, only how to link new information into a preexisting generative model.

Using the KL divergence of the posterior predictive distribution as a target to optimize does a bit better, but still isn't a "solution".

Seriously, what doeos "topology of latent variables" even mean?

A topology on U is a system of subsets of U that's closed under union and finite intersections and containing the empty set and U itself. Go!

>Seriously, what doeos "topology of latent variables" even mean?

The simple answer is: the graph topology of the resulting program traces, equivalent to the topology of a graphical model sampled from a distribution over graphical models. The complicated answer is: the Scott topology of the program-trace space.