Hacker News new | ask | show | jobs
by knappa 519 days ago
Mathematician here (trained as pure, working as applied). Non-elegant proofs are useful, if the result is important. e.g. People would still be excited by an ugly proof of the Riemann hypothesis.^1 It's important too a lot of other theorems if this is true or not. However, if the result is less central you won't get a lot of interest.

Part of it is, I think, that "elegance" is flowery language that hides what mathematicians really want: not so much new proofs as new proof techniques and frameworks. An "elegant" proof can, with some modification, prove a lot more than its literal statement. That way, even if you don't care much about the specific result, you may still be interested because it can be altered to solve a problem you _were_ interested in.

1: It doesn't have to be as big of a deal as this.

5 comments

Then again, even an 'elegant' proof can be surprisingly inflexible. I've recently been working through Apéry's proof that ζ(3) is irrational. It's so simple that even a clueless dabbler like me can understand all the details. Yet no one has been able to make his construction work directly for anything else (that hasn't already been proven irrational). C'est la vie, I suppose.
There was a post yesterday of a quanta article: https://news.ycombinator.com/item?id=42644896.

The article explains that two mathematicians were able to place Apery's proof that ζ(3) is irrational into a much wider (and hence more powerful) framework. I doubt that framework is as easy to understand as the original proof. But in the end something with wider applicability did come out of the proof.

Yeah, many of the fancy analytic methods are beyond my level of dabbling. I've been trying to learn more about them, so I can solve the myriad exercises left to the reader in all the Diophantine approximation papers.

Still, the newer methods publicized in the Quanta article definitely get more involved, and at least from my perspective they don't establish things as elegantly as Apéry's ζ(2) and ζ(3) arguments do. Hopefully they turn out to be powerful in practice, to make up for it.

One thing that many mathematicians today don’t think about is how deeply intertwined the field has historically been with theology. This goes back to the Pythagoreans at least.

That survives in the culture of mathematics where we continue to see a high regard for truth, beauty, and goodness. Which, incidentally, are directly related to logic, aesthetics, and ethics.

The value of truth in a proof is most obvious.

The value of aesthetics is harder to explain, but there's no denying that it is in fact observably valued by mathematicians.

As for ethics, remember that human morality is a proper subset thereof. Ethics concerns itself with what is good. It may feel like a stretch, but it's perfectly reasonable to say that for two equally true proofs of the same thing, the one that is more beautiful is also more good. Also, obviously, given two equally beautiful proofs, if only one is true then it is also more good.

> That survives in the culture of mathematics where we continue to see a high regard for truth, beauty, and goodness

As a non-mathematician, I've noticed this as well, and I have a suspicion the historical "culture" is holding the field back. Gödel proved there are an infinite number of true arithmetic statements unprovable within any (consistent, sufficiently powerful) formal system. But our "gold standard" formal system, ZFC, has about as many axioms as we have fingers — why is finding more axioms not the absolute highest priority of the field?

We struggle to prove facts about Turing machines with only six states, and it's not obvious to me that ZFC is even capable of resolving all questions about the behavior of six state Turing machines (well, specifically just ZF, as C has no bearing on these questions).

Yet Turing machines are about as far from abstract mathematics as one can get, because you can actually build these things in our physical universe and observe their behavior over time (except for the whole "infinite tape" part). If we can't predict the behavior of the majority of tiny, deterministic systems with ZFC, what does that say about our ability to understand and predict real world data, particularly considering that this data likely has an underlying algorithmic structure vastly more complex than that of a six state Turing machine?

More formally, my complaint with the culture of mathematics is:

1) We know that for any string of data, I(data : ZFC) ≤ min(K(data), K(ZFC)) + O(1)

2) K(ZFC) is likely no more than a few bytes. I think the best current upper bound is the description length of a Turing machine with a few hundred states, but I suspect the true value of K(ZFC) is far lower than that

3) Thus K(data) - K(data | ZFC) ≤ "a few bytes"

Consider the massive amounts of data that we collect to train LLMs. The totality of modern mathematics can provide no more than a few bytes of insight into the "essence" of this data (i.e., the maximally compressed version of the data). Which directly translates to limited predictability of the data via Solomonoff induction. And that's in principle — this doesn't even consider the amount of time involved. If we want to do better, we need more axioms, full stop.

One might counter, "well sure, but mathematicians don't necessarily care about real world problems". Ok, just apply the same argument to the set of all arithmetic truths. Or the set of unprovable statements in the language of a formal system (that are true within some model). That's some interesting data. Surely ZFC can discover most "deep" mathematical truths? Not very likely. The deeper truths tend to occur at higher levels of the arithmetic hierarchy. The higher in the hierarchy, the more interesting the statement. And these are tiny statements too: ∀x ∃y ∀z [...]. Well we're already in trouble because ZFC can only decide a small fraction of the Π_2 statements that can fit on a napkin and it drops off very quickly at higher levels than that. Again, we need more axioms.

> Yet Turing machines are about as far from abstract mathematics as one can get, because you can actually build these things in our physical universe and observe their behavior over time (except for the whole "infinite tape" part)

The infinite tape part isn't some minor detail, it's the source of all the difficulty. A "finite-tape Turing machine" is just a DFA.

> is just a DFA

Oh is that all? If resource bounded Kolmogorov complexity is that simple, we should have solved P vs NP by now!

I debated adding a bunch of disclaimers to that parenthetical about when the infinite tape starts to matter, but thought, nah, surely that won’t be the contention of the larger discussion point here haha

It’s an LBA, a Linear Bounded Automata.
No, an LBA in general doesn't have a finite tape. It still has an infinite tape, to accommodate arbitrary length inputs, it's just that the tape cannot grow beyond the length of its input (or a constant multiple of it, which is equivalent by the linear speedup trick).
Arbitrarily long and infinite are very different things.
> > That survives in the culture of mathematics where we continue to see a high regard for truth, beauty, and goodness

> As a non-mathematician, I've noticed this as well, and I have a suspicion the historical "culture" is holding the field back.

Holding the field back from what? If the goal of the practitioners of the field is to seek mathematically beauty, then well, that is what they will focus on.

Besides that, I don't really follow your argument about Godel & information theory & that adding more axioms is the key to moving math forwards. In the vast majority of cases, the difficulty in finding a proof of a statement is not that the statement isn't provable under a given formal system, it's that we simply can't find it. But maybe I misunderstand you?

> Holding the field back from what? If the goal of the practitioners of the field is to seek mathematically beauty, then well, that is what they will focus on.

Holding the field back from answering questions about the behavior of simple deterministic systems that arguably have an outsized influence on science, machine learning, and mathematics itself. I don't disagree that "beauty" is certainly subjective, but the mathematics research that gets funded by universities and research labs is only tangentially related to such an aesthetic notion.

> In the vast majority of cases, the difficulty in finding a proof of a statement is not that the statement isn't provable under a given formal system, it's that we simply can't find it

Finding a proof is an inherently algorithmic process. Presumably, the human brain uses heuristics (intuition) that have thus far managed to beat simple algorithmic approaches. But it's hard to believe that's going to last much longer. As an extreme example, simply knowing more values of the Busy Beaver function allows one to find proofs more easily. For example, encode the veracity of the statement corresponding to Fermat's Last Theorem into a program (a k-state Turing machine) that dovetails enumeration over all possible values for x, y, z, and n > 2. This program halts if the statement is satisfied by some set of natural numbers and never halts otherwise. But if we know which k-state Turing machine corresponds to BB(k), then we just run the two programs in parallel and if the Fermat program hasn't halted by the time the program for BB(k) has, then the conjecture is true (necessarily false otherwise). Scott Aaronson has written some articles and blog posts about this.

Now the argument I'm making doesn't necessarily depend on trying to determine more values of the uncomputable busy beaver function. You don't have to know the fastest growing computable sequence in order to derive some benefit from greater knowledge of the halting sequence. If we presume that an interesting proof exists within ZFC, then more axioms (that constitute a more powerful system) can lead to much shorter proofs. This is a result obtained from the work of Gödel, Feferman, Jean-Yves Girard, et al. So while you're right that a conjecture we are interested in might have a proof that "exists" within ZFC, whether the minimal such proof can be written down in a human lifespan is a different question. Considering there are many famous conjectures that have been open for centuries now, perhaps the next step isn't having more people stare harder at the problem but rather enriching our foundational formal systems with additional axioms.

I can’t tell if this is crazy or brilliant. Math has been working diligently for a long time to reduce the axioms. Most of the obvious Gödel sentences are stupid things like there is a number that is the proof of itself. The whole project is to derive all of the structure of mathematics, with a high information complexity, from basic axioms but also from comp,ex definitions. I think the definitions (natural numbers as sets, integers as equivalence sets of pairs of natural numbers, etc.) pump up the information complexity from the axioms. Like the initial state of Life allowing arbitrary computation from the simple Life rules.

The idea that there might be more axioms that would let one deduce more about computable complexity classes or the like seems pretty unlikely.

The number of provable statements and unprovable statements is countably infinite and we aren’t lacking the ability to prove things due to obviously true missing axioms.

> 1) We know that for any string of data, I(data : ZFC) ≤ min(K(data), K(ZFC)) + O(1)

What are I and K? What does "data : ZFC" mean?

K(x) is the Kolmogorov complexity of x, i.e., the length of the shortest program that halts and outputs x, typically defined with respect to a universal Turing machine.

I(x : y) = K(x) + K(y) - K(x, y) is the mutual algorithmic information of two strings. It is the algorithmic equivalent to Shannon's mutual information. No algorithm f can increase the mutual algorithmic information between two strings, i.e. ∀f I(x : f(y)) ≤ I(x : y) + O(log(K(f))).

I used "data" to denote any collection of interesting data at all. For example, it could be data obtained from the real world, the observed behavior of Turing machines at very long timescales, or a collection of mathematical proofs. By "ZFC" I just mean the description of any program that enumerates all proofs of ZFC.

The takeaway here is that ZFC is very limited in what it can say about real world data or even what it can say about almost all deterministic computations, including those with very short programs. Levin describes this much better than I could in his famous "Forbidden Information" paper: https://arxiv.org/pdf/cs/0203029.

Levin further argues that extending PA is unlikely... I tend to disagree on that point. How did we get the more powerful theory of ZFC or any of these axioms in the first place then? They are coming from somewhere. But I'm not even going to attempt to debate Levin's case on Hacker News haha

There are plenty of mathematicians - mostly set theorists - who are actively working on finding new axioms of mathematics to resolve questions which can't be resolved by ZFC. Projective Determinacy is probably the most important example of a new axiom of mathematics which goes far beyond what can be proved in ZFC, but which has become widely accepted by the experts. (See [1] for some discussion about the arguments in favor of projective determinacy, and [2] for a taste of Steel's position on the subject.)

I suggest reading Kanamori's book [3] if you want to learn more about this direction. (There are plenty of recent developments in the field which occured after the publication of that book - for an example of cutting edge research into new axioms, see the paper [4] mentioned in one of the answers to [5].)

If you are only interested in arithmetic consequences of the new axioms, and if you feel that consistency statements are not too interesting (even though they can be directly interpreted as statements about whether or not certain Turing machines halt), you should check out some of the research into Laver tables [6], [7], [8], [9]. Harvey Friedman has also put a lot of work into finding concrete connections between advanced set-theoretic axioms and more concrete arithmetic statements, for instance see [10].

[1] https://mathoverflow.net/questions/479079/why-believe-in-the... [2] https://cs.nyu.edu/pipermail/fom/2000-January/003643.html [3] "The Higher Infinite: Large Cardinals in Set Theory from their Beginnings" by Akihiro Kanamori [4] "Large cardinals, structural reflection, and the HOD Conjecture" by Juan P. Aguilera, Joan Bagaria, Philipp Lücke, https://arxiv.org/abs/2411.11568 [5] https://mathoverflow.net/questions/449825/what-is-the-eviden... [6] https://en.wikipedia.org/wiki/Laver_table [7] "On the algebra of elementary embeddings of a rank into itself" by Richard Laver, https://doi.org/10.1006%2Faima.1995.1014 [8] "Critical points in an algebra of elementary embeddings" by Randall Dougherty, https://arxiv.org/abs/math/9205202 [9] "Braids and Self-Distributivity" by Patrick Dehornoy [10] "Issues in the Foundations of Mathematics" by Harvey M. Friedman, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3137697

> Part of it is, I think, that "elegance" is flowery language that hides what mathematicians really want: not so much new proofs as new proof techniques and frameworks. An "elegant" proof can, with some modification, prove a lot more than its literal statement. That way, even if you don't care much about the specific result, you may still be interested because it can be altered to solve a problem you _were_ interested in.

Do you feel this could be a matter of framing? If you view the "proof" as being the theorem prover itself, plus the proof that it is correct, plus the assumptions, then whatever capability it gains that lets it prove your desired result probably is generalizable to other things you were interested in. It would seem like a loss if they're dismissed simply because their scratch work is inscrutible.

"It doesn't have to be as big of a deal as this."

Agree. The truthfulness of the four-colour theorem is good to know, although there is not yet any human-readable proof.

I feel like the four-color theorem automated proof is much more 'human-readable' than the proofs done with automated theorem provers. Because with the four-color theorem, there is a human readable proof that says "if this finite set of cases are all colorable, then all planar graphs are colorable". And then there is some rather concrete code that generates all the finite cases, and finds a coloring for them. Every step in there makes sense, and is fully understandable. The fact that the exhaustive checking wasn't done by hand doesn't mean its hard to understand how the proof works, or what is 'actually going on'.

For a general theorem prover, reading the code doesn't explain anything insightful about why the theorem is true. For the 4 color theorem, the code that proved it actually gives insight in how the proof works.

And of course if you come across an inelegant proof you suddenly have the opportunity to think about it and see if you can make it more elegant!