Hacker News new | ask | show | jobs
by sm0ss117 66 days ago
Mathematics seems like the ideal candidate for AIs to achieve absurd results. It's a purely abstract grammar with true auto-verifiability. Even SWE has the requirement of interacting with real physical things. In math there's no external feedback required, you're solely bounded by the rate and quality of token generation.
2 comments

This misses the mark on at least two accounts: 1. Proofs without human understanding have less value for mathematicians 2. At least for now, interestingness depends on human judgment. It is subjective and not as verifiable.
1. The four color theorem is a useful case study, for which the original proof was validated and 400 pages long. My prediction is that the first couple waves of proofs will be hard enough that a layman couldn't produce them, but simple enough that experts can verify them. Over time the most advanced proofs will get more and more complicated until humans can no longer verify them, this process could happen over the course of a few month or could take literally hundreds of years.

2. Especially early on the overwhelming majority of the proofs are likely to be uninteresting and more novel just because actually producing them would take expert time that's better spent elsewhere. That being said, as above over time I expect the interestingness of proofs to go up until they eventually regularly produce interesting proofs. The vast majority of proofs are likely to maintain their position as of no interest to humans for the simple reason that the vast majority of proofs are of no interest to humans.

In neither case will I make any particular guesses about a timeline beyond it seems like the way things will go.

Every new mathematician that comes along doesn’t know everything that has come before him. He needs to go learn all the math that his predecessors did. I don’t see how an LLM coming up with these proofs changes that.
Because the problem space is basically infinite. If a person is working on a problem, its probably interesting to at least one person. Randomly walking through the problem space might be interesting, but I don't know how the signal will fare against other humans.
Grammar seems like you’re talking about LLMs specifically. Well, isn’t Sudoku just math? LLMs suck at Sudoku last I checked. When told not to code a solver, its very first deduction was wrong.
Generally when people talk about using LLMs to do mathematics research they’re not talking about the LLM alone, but the LLM + a harness for it to write and execute theorem provers such as Lean or Coq to validate their results.
I guess I just don’t have the experience or optimism that a harness around an LLM, which can’t make the first, bare deduction on its own, is a good use of compute.

I got out of RLHF, including games and puzzles, before agents took off and maybe I have outdated info. But we estimated RLHF’ing a single hard full sized sudoku was ~25 hours worth of work.