| HN Mirror

>Self-trust seems like a problem on paper, but the reality is that normal mathematics uses very few universes.

Self-trust (broadly construed) is interesting to me because it seems relevant to designing goal-based agents that are "stable", in the sense that they trust that future versions of themselves will have accurate beliefs (and therefore don't have an incentive to mess around with their systems for forming beliefs). If we try to formalize this intuition with "beliefs" as theorems proven by a formal system, we run into reflection problems; having your theorem prover assert that it will keep outputting only true statements feels awfully close to asserting its own soundness. So even if your agent can perform all the usual mathematical reasoning it needs, it still can't do all the useful reasoning about itself (it would need another large cardinal... and then another...).

The self-trust property in the paper says that it's possible to "learn from experience" that your future self is probably going to have pretty good beliefs. Specifically, a logical inductor P_n learns (roughly speaking) that "if P_f(n) thinks Phi is likely, then Phi is likely", where f(n) can be a fast-growing computable function. That is, on day n, P_n believes a sort of "probabilistic soundness" condition for its future self P_f(n). This is weaker than full soundness in at least two ways, but it is fully "reflective" in the sense that P believes this of itself.