K(x) is the Kolmogorov complexity of x, i.e., the length of the shortest program that halts and outputs x, typically defined with respect to a universal Turing machine.
I(x : y) = K(x) + K(y) - K(x, y) is the mutual algorithmic information of two strings. It is the algorithmic equivalent to Shannon's mutual information. No algorithm f can increase the mutual algorithmic information between two strings, i.e. ∀f I(x : f(y)) ≤ I(x : y) + O(log(K(f))).
I used "data" to denote any collection of interesting data at all. For example, it could be data obtained from the real world, the observed behavior of Turing machines at very long timescales, or a collection of mathematical proofs. By "ZFC" I just mean the description of any program that enumerates all proofs of ZFC.
The takeaway here is that ZFC is very limited in what it can say about real world data or even what it can say about almost all deterministic computations, including those with very short programs. Levin describes this much better than I could in his famous "Forbidden Information" paper: https://arxiv.org/pdf/cs/0203029.
Levin further argues that extending PA is unlikely... I tend to disagree on that point. How did we get the more powerful theory of ZFC or any of these axioms in the first place then? They are coming from somewhere. But I'm not even going to attempt to debate Levin's case on Hacker News haha
I(x : y) = K(x) + K(y) - K(x, y) is the mutual algorithmic information of two strings. It is the algorithmic equivalent to Shannon's mutual information. No algorithm f can increase the mutual algorithmic information between two strings, i.e. ∀f I(x : f(y)) ≤ I(x : y) + O(log(K(f))).
I used "data" to denote any collection of interesting data at all. For example, it could be data obtained from the real world, the observed behavior of Turing machines at very long timescales, or a collection of mathematical proofs. By "ZFC" I just mean the description of any program that enumerates all proofs of ZFC.
The takeaway here is that ZFC is very limited in what it can say about real world data or even what it can say about almost all deterministic computations, including those with very short programs. Levin describes this much better than I could in his famous "Forbidden Information" paper: https://arxiv.org/pdf/cs/0203029.
Levin further argues that extending PA is unlikely... I tend to disagree on that point. How did we get the more powerful theory of ZFC or any of these axioms in the first place then? They are coming from somewhere. But I'm not even going to attempt to debate Levin's case on Hacker News haha