I'd call this paper a "big deal" in that it is a normalization of, very fair summary of, and indication that there is a future for, LLMs in pure mathematics from one of its leading practitioners.
On HN here, we've spent the last few years talking and thinking a lot about LLMs, so the paper might not include much that would be surprising to math-curious HN'ers. However, there is a large cohort of research mathematicians out there that likely doesn't know much about modern AI; Terence is saying there's a little utility in last-gen models (GPT-4), and he expects a lot of utility out of combining next-gen models with Lean.
Again, not surprising if you read his blog, but I think publishing a full paper in AMS is a pretty important moment.
Our world is increasingly defined by software without correctness proofs. Our tools are too clumsy, and we're just not smart enough, so we accept this situation. AI-verified code could become one of the most economically important applications of machine learning, when we cross the threshold where this becomes feasible.
I'm a mathematician, and we struggle with the purpose of a proof: Is it to verify, or also to explain so we can generalize? Machine proofs tend to be inscrutable.
While "the singularity" is unlikely anytime soon, we should recall that before the first atomic test there were calculations to insure that we didn't ignite the atmosphere. We're going to need proofs that we have control of AI, and there's an obvious conflict of interest in trusting AI's say so here. We need to understand these proofs.
A good thing about machine proofs is that, just like code, they can be refactored. I also use LLMs when writing code, but the code that I end up pushing is almost never exactly what the LLM has generated. I don't really see the problem with that. It's way less taxing for me mentally to get an LLM to generate definition and implementations and just refactor them quickly. I would expect LLMs for lean to be similar in the future.
Interesting thoughts, thanks. It seems to me a model trained to generate Lean could also be purposed to explain a large Lean proof, and that’s very interesting. So much of modern math is limited to extremely small cohorts.
None of that is dispositive inre provable safety, obviously.
LLMs as they are I postulate would not work well for this. But, purpose built stochastic auto complete with a type checker to reject the junk? That could be actually useful. Funnily enough it's also a domain of application that wouldn't make any money at all. It would have to be an offline LLM that is reasonably efficient to execute locally.
I think his comparison to previous machine-assistance is misleading. In previous cases, the use of machines was never creative, whereas now AI has the ability to suggest creative lines.
In the short-term, this sounds exciting. But I also think it reduces the beauty of math because it is mechanizing it into a production line for truth, and reduces the emphasis on the human experience in the search for truth.
The same thing has happened in chess, with more people advocating for Fischer random because of the boring aspect of preparing openings with a computer, and studying games with a computer. Of course, the computer didn't initiate the process of in-depth opening preparation but it launched it to the next level. The point is that mechanization is boring except for pure utility.
My point is that math as a thing of beauty is becoming mechanized and it will lead to the same level of apathy in math amongst the potentially interested. Of course, now it's exciting because it's still the wild west and there are things to figure out, but what of the future?
Using advanced AI in math is a mistake in my opinion. The search for truth as a human endeavor is inspiring. The production of truth in an industrialized fashion is boring.
> The production of truth in an industrialized fashion is boring.
Chess is a game, if it gets boring that defeats the point…. Math on the other hand is basic science research, and enables us to understand how the universe and our bodies work, to massive benefit. I don’t care how “boring” it is, the knowledge could have immense value or be critical for our survival and if AI can allow us to access it, all the better.
I would argue with your "massive benefit", unqualified. Because while it is true that basic research HAS improved life, much of modern research and technical production has decreased it: less sense of community, microplastics, climate change, destabilization of the working force (AI), less sense of purpose. What's the use of an even longer life if our entire way of life is just a production to add incremental levels of safety, which hardly helps most anyway?
While you might be right, it is VERY far from clear that the latest advanced research will really help us. It could also lead to our annihilation or at least dehumanization, which is really not much better than annihilation.
In fact, due to the immense damage technology has caused, the burden of proof should be on the technologists to reliably demonstrate that new technology is even worth it beyond propping up our broken, global capitalistic system.
I’m not talking about technology, I’m talking about having an understanding of how reality works. I fully agree with you that one should have a sense of ethics and use the precautionary principle when deciding what to do with that knowledge. With deeper knowledge we can develop more humane and environmentally safe technology, and cure diseases that cause massive suffering…
We’re past the point of just going back to preindustrial tech with less negative impacts- but with a deeper understanding of reality we could, e.g. pull carbon straight out of the atmosphere to manufacture almost anything in a renewable way, while also understanding biochemistry enough to make things that won’t be toxic or persist in the environment, and are readily broken down and reused.
> With deeper knowledge we can develop more humane and environmentally safe technology, and cure diseases that cause massive suffering…
This is where we fundamentally disagree. I don't believe (and I've never seen any convincing evidence) that we could EVER develop more human and environmentally safe technology. Primarily because technology always requires physical resources (mining) and habitat destruction, and because there are 8 billion people in the world and there will always be the unscrupulous who will use that technology for destruction. And even ignoring the unscrupulous, the existing habitat destruction from said technology use already (in my valuation) is too great to balance out some of the so-called positive uses of technology.
> Primarily because technology always requires physical resources (mining) and habitat destruction
There is no fundamental reason to require such things. As Fuller said, the goal of technology is to “do more and more with less and less until you can do everything with nothing.”
In my view it’s not the idea of technology that has been the problem, but that it was done by people with no understanding of the impacts, or sense of responsibility. The reason our current tech is so nasty and damaging is because our knowledge has been too primitive to do better thus far, and now people are not willing to give it up.
Synthetic biology, for example can now pull carbon straight from the air and make it into nontoxic biodegradable building materials- or really almost anything. This can eventually replace all mining and toxic chemical factories, with basically just old fashioned fermentation in a vat, that neither produces or uses anything toxic- but can replace all of the nasty stuff we currently make from mining and petroleum. A deeper understanding of biology will allow us to further reduce risks and environmental impacts by really deeply understanding which molecules we can make safely without toxic impacts on humans or other species, and without environmental persistence.
You think people in the hunter gatherer days lived a better more humane life? You walked too close to a branch and got a small cut and a few weeks later you are dead. Oh no, You fell and can't get up while a predator is chasing your group, better prepare to be eaten alive. You accidentally ate that one fruit that looks similar to another one, oops you die shitting your bowels out. What's that? You are getting your third child, too bad the other two died in child birth together with their mother.
I really don't think you realize how cushy modern life is compared to even the hardships of a few hundreds years ago.
> I don't believe (and I've never seen any convincing evidence) that we could EVER develop more human and environmentally safe technology.
Come on now. Renewable energy is gaining on fossil fuels around the world. The air in London used to be thick with smog, and now it's not. Acid rain is a thing of the past. The ozone hole is shrinking.
Fire and the wheel are technology; are you against them too?
> In fact, due to the immense damage technology has caused, the burden of proof should be on the technologists to reliably demonstrate that new technology is even worth it beyond propping up our broken, global capitalistic system.
A fair argument as far as it goes, but unlike engineering and some other activities, mathematics isn't about creating more technology. Although mathematics can be applied to that purpose, that's not its essence.
If electronics and computers didn't exist, if industrial society didn't exist, mathematics would still exist, and perhaps it wouldn't be confused so often with its applications.
Mathematics isn't responsible for how we choose to apply it.
Human chess players are still incredibly valuable because we want to see what humans are capable of. For the same reason athletes are valuable even though a car can outrun them.
With mathematicians, and others working in intelligence-intensive tasks (most of us here probably), I’m not sure what the value would be post-AGI.
The point is that even with mathematics and programming, there is an underlying community aspect that cannot be ignored, but is hidden under layers of utility. For example, even in programming, people getting together to code, collaborating, and sharing their projects is a small but significant drop in people creating a community.
With mathematics, the sharing of ideas and slaving over the proof of a theorem brings meaning to lives by forging friendships. Same with any intellectual discipline: before generative AI, all the art around us was primarily from human minds and were echoes of other people through society.
Post-AGI, we abandon that sense of community in exchange for pure utility, a sort of final stage of human mechanization that rejects the very idea of community.
If a change in other people's ability to do mathematics affects your level of enjoyment in doing mathematics, you don't really enjoy mathematics. You enjoy feeling smarter than other people, of belonging to an exclusive club.
Preserving people's access to this kind of enjoyment is not something that should carry any weight in my opinion.
Oh come on, that's ridiculous. I wasn't referring to a change in ability, but a change in culture. The modern culture of mathematics is getting worse in my opinion, and many feel the same. Besides, I don't even practice math any more...
One of Bill Thurston's answers on MathOverflow should be required reading on this and a lot of related topics. When basically asked "How do I cope with the fact that I'm no Gauss or Euler?" he replied:
> The product of mathematics is clarity and understanding. Not theorems, by themselves... mathematics only exists in a living community of mathematicians that spreads understanding and breaths life into ideas both old and new. The real satisfaction from mathematics is in learning from others and sharing with others. All of us have clear understanding of a few things and murky concepts of many more. There is no way to run out of ideas in need of clarification. The question of who is the first person to ever set foot on some square meter of land is really secondary. Revolutionary change does matter, but revolutions are few, and they are not self-sustaining --- they depend very heavily on the community of mathematicians.
Ongoing relationships and cooperation is how humanity does its peak stuff and reaches peak understanding (and how humans usually find the most personal satisfaction).
LLMs are powerful precisely because they're a technology for concentrating and amplifying some aspects of cooperative information sharing. But we also sometimes let our tools isolate.
Something as simple as a map of a library is an interesting case: it is a form of toolified cooperation, you can use it to orient yourself in discovering and orienting library materials without having to talk to a librarian, which saves time/attention... and also reduces social surface area for incidental connection and cooperation.
That's a mild example with very mild consequences and only needs mild individual or cultural tools in order to address the tradeoffs. We might also consider markets and the social technology of business which have resorted in a kind of target-maximizing AGI. The effects here are also mixed, certainly in terms of connection / isolation, also potentially in terms of environmental impact. A paperclip maximizer has nothing on an AGI/business that benefits from mass deforestation, and we created that kind of thing hundreds of years ago.
The question is if we're going to maintain the kind of social/cultural infrastructure that could help us be aware of and continue to invest in the value the social/cultural infrastructure.
Or, put more simply, if we're going to build a future for people.
Again, I am not arguing against ALL computer use of chess. Just the chess engine/AI itself. Why do you insist on taking all of technology as an indivisible unit in your argument?
They’re the same technology: you don’t get to select only some of the applications, which appeal to your personal aesthetics.
We arrived at engines before online chess, and the two have come up together — both being enabled by the growth of computers. You can choose not to use an engine, but it will exist either way because others will choose to use it when it’s enabled by those same things.
To get rid of the engine, you have to get rid of computers — or in the case of Freestyle/Chess960, create so many openings a human can’t memorize them all so only has a short time to prepare.
I think there will always be a demand for human knowledge workers. They might not push their respective fields forward in the same capacity as AI will be able to, but there will be a niche market for products and ideas authored entirely by humans. Programmers and mathematicians will actually be craftspeople, and communities will continue to exist around this. These will probably not be highly paid positions as they are today, and their products likely won't power mission-critical infrastructure. Some might pursue it simply as a hobby and for the mental exercise.
It wouldn't be much different from small artisan shops we have today in other industries. Mass production will always be more profitable, but there's a market for products built on smaller scales with care and quality in mind. Large companies that leverage AI black boxes won't have that attention to detail.
The problem with this is that most people will sense a reduced importance for themselves. Most people seem to think that with AI doing everything, we can just relax and do our hobbies. But that's just wishful thinking based on a culture of overworking: we overwork so we dream of a utopia where we don't work. But the opposite of overworking is a sense of complete irrelevance, which will in some sense be more problematic than everyone working too much.
Yes, a few people might find some meaning in a life where they are not that important, but most people need to feel important to others, and AI takes that away.
That's true. It's a problem that isn't discussed nearly enough.
This is partly why I think that the pace of AI development needs to slow down. We've had disruptive technologies in the past, and society eventually adapted when new jobs were created, but none of them had the potential to completely replace humans in most industries. None of them raised existential questions about our humanity, the value of human labor, our place in society, and the core pillars of economy, education, etc. And, crucially, none of them were developed in just a few years.
We need time to discuss these topics and prepare for the shift. But, of course, any mention of slowing down is met with criticism of regulations stifling innovation and profits, concern about losing a technological advantage over political opponents, etc., so this is unlikely to happen.
This century certainly won't be boring, so let's enjoy the ride, and hope that no major conflict pops off. Though with the way things are going, my hope is waning.
Well, I certainly agree with slowing things down. Time to discuss would be much better than nothing. As I tell many, I am glad I was born when I was, and not now. I cherish the time I had before anyone knew of the internet. Even though I am using it now, mostly to spread my ideas, I would gladly trade it for a world where it didn't exist.
Bluntly speaking, I think it is going to be the journey that matters; to work with mathematics is to work on yourself and a way to explore your creativity.
Right, I enjoy programming for the same reasons. But will I be able to make a living from it, 20 years from now? Probably not.
To be clear I don’t think AI is bad, and even if it is I’m pretty damn sure it’s not avoidable. But we’re in for drastic changes and we should start getting used to it.
I think it's a sad thing that people may not be able to make a living from what they love. A lot of people suggest hobbies, but I think it's nice to contribute to society with the skills of our minds.
I think we have to do both: get used to it, but fight it at the same time in case we can get rid of it.
> Using advanced AI in math is a mistake in my opinion.
I too feel saddened by the idea of automating math if it leads to inscrutable proofs or theories. But it seems essential that we advance the tools used for mathematical discovery if we hope to keep advancing. Hopefully we can find a balance where the advancement of mathematical understanding continues to be a human story, but we're not artificially held back by pretending new tools don't exist.
Computers allow us to scale what we analyze in math — and that’s a good thing.
Nobody examines structure of group diagrams because drawing interesting ones by hand is borderline impossible, but takes just a few minutes on a computer. However, they’re a natural way to arrive at algebraic/geometric equivalence. (And indeed, the first time I had an intuition for it.)
To me, you sound like someone lamenting swimming is meaningless because we invented boats.
Again, like so many, you are taking computer use in math as an indivisible whole. I never said that computers were NOT useful. Only that the use of creative AIs in math are counterproductive in the long run, hence implying a point of diminishing returns that we push towards (due to the peverse incentives of academia).
There is also a fundamental difference between swimming and math. There is no prisoner's dilemma situation when it comes to swimming: with swimming, people CHOOSE to swim because they like it. But due to different incentives, people will CHOOSE to use AI only because others use it and it will become the only path eventually.
In other words, swimming is still possible even though boats exist. People going into mathematics will not have the possibility of being of any use without AI, because the prisoner's dilemma (arms race) will ensure that math is no longer about anyone caring about math without AI.
You’re lamenting that inventing boats has destroyed the beauty of swimming.
- - - Edit - - -
Responding to your expanded argument:
You could never swim to a new continent, which boats enabled. This is the same — people can choose to keep doing the same limited math themselves, in a slower way, but will never reach the places people can aided by tools. That’s simply how the world is. But we shouldn’t restrict the distance people can travel to adhere to the aesthetics of swimming.
You’re arguing precisely that: we must limit our intellectual journey because you don’t approve of the aesthetics of the tool to travel further.
I specifically gave a reason why boats and swimming is an entirely different situation. Due to different incentives, AI can take away opportunities for people to learn math the old fashioned way, but boats did not do that to swimming precisely because the incentives for swimming (moving without a boat) are different. But I added that in an edit before I saw your comment.
Those reasons exist for learning math: mental fitness, personal enjoyment, sport, etc.
But you’ve subtly changed your argument: before you were arguing that the beauty was in creating mathematics, not merely learning already written mathematics.
My exact point is that learning surmathematics (math taken further by AI) is it’s own interesting pursuit — and appeals to my sense of aesthetics and adventure more than piddling around merely to say it was all done by human hands.
I’m not following where you believe the swimming and boat analogy breaks down: there’s still the same personal reasons to learn and do mathematics one might swim; but learning surmathematics is an adventure to a whole new land.
- - - Edit - - -
Responding to sibling comment as well:
> I am arguing that we should limit our intellectual journey, to preserve the humanistic aspects of the journey. That is exactly my position.
That’s exactly what I compared to swimming rather than boats — because you won’t reach the same places and it’s done for aesthetic reasons.
Some people (eg, myself) want the surmathematics adventure.
How is AI different from using a calculator? AI is just giving a new abstraction layer, in the same way that computers have done before AI. And these non-AI tools have allowed us to produce both deep research and beautiful theories.
I'd be more worried about the problems related to a few companies concentrating all the tools and therefore the power.
AI is different because its ability to suggest creative lines of thinking will change the entire structure of how mathematics is done.
It's the same difference between "dumb" algorithms and "generative AI" algorithms. The generative AI has the capability to replace human thinking in some cases, whereas the dumb algorithms only replace rote work. Since creativity is not just what allows innovation but also forms the center of community and personal expression, we are also replacing those "soft" components of scentific exploration that eliminate the importance of the individual.
Actually, most of the paper seems a bit obvious from the computer science side. LLMs scale for really complex tasks, but they are neither correct nor complete. If combined with a tool that is correct (code verifiers, interactive theore provers), then we can get back a correct pipeline.
One vision in the article that stood out for me, was how formal proof assistants allow for large teams to collaborate on proving theorems. Imagine what we could achieve if we could do mathematics as a hive mind!
But that's basically what mathematics has been from day 0. What you mention as a hive mind presumably don't refer to a situation where individual minds and intimate reflection can be put out of the equation. On the other hand, mathematics are not possible outside a society which provides a large set of conveniences to leverage on, including communication tools such as a language.
The idea of neurosymbolic systems has been in the air a long time, but every time I look at the commentary of an article like this I’m surprised at number the “OMG why didn’t anyone think of this?” type of comments.
For a while I got the impression that an ideological undercurrent of “DL vs GOFAI” had gotten in the way of more widespread exploration of these ideas. Tao’s writing here changed my view to something more pragmatic, that being the formalization of the symbolic part of neurosymbolic AI requires too much manual intervention to easily scale. He is likely onto something by having an LLM in the loop with another system like Lean or Athena to iterate on the formalization process.
I know this guy is Fields medalist, but all his recent posts and now this publication lack any substance and actual contributions, so it sounds like he is more in the role of hyped twitter influencer than researcher.
LLM is probably not the right model for AIs strapped to a formal solver. But experience which has been gained with LLMs may help design those maths oriented models.
What's with the beef? In the paper Terence describes how he currently uses some imperfect but useful tools, which will surely change in the future if better tools appear. It does not say "LLM will be smarter than a kid bwahaha world domination".
It may only take one or two further iterations of improved capability (and integration with other tools, such as computer algebra packages and proof assistants) until the level of "(static simulation of a) competent graduate student" is reached, at which point I could see this tool being of significant use in research level tasks.
The beef is that many of us have enough grime on our hands. Many of us at the absolute elite frontier of this technology find its application in both technical reality and fundraising fantasy to be beneath any possible ideal.
Have you considered that children, and people in general, may be very significantly less intelligent than your baseline assumption?
A flaw in the Turing test is failing to specify which person is making the judgement. We're working with statistical distributions here and I would not bet on the intellect displayed by the LLM models being below that displayed by the human population today, let alone with more improvement to one or degradation to the other.
More concretely, if you sketch some normal distributions on a whiteboard for people vs machine based on how you see things, it should be hard to confidently claim minimal overlap.
Even without a definition of intelligence, this is not what the paper is about, which only mentions LLMs in passing. And LLMs can be useful even if they are wrong, because formal verification (though Lean and such) checks the result.
That's fine, and unrelated to the article in any way.
LLMs are way more useful than a child in many ways, some of which are discussed in the article. They don't need to be as intelligent as a child for anything proposed in the article.
This isn't really a meaningful prediction unless you define clearly your idea of what being "as intelligent as a precocious child" is, and how you would assess an LLM or any other system against that metric. Though I suppose you avoid the risk of having to move the goalposts later if you never set them up in the first place.
I have a great many regrets in life but if I died opposing Sam Altman and Fidji Simo and Larry Summers in the newest version of their oppressive lies that would be a good death.
"I have found it works surprisingly well for writing mathematical LaTeX, as well as formalizing in Lean; indeed, it assisted in writing this very article by suggesting several sentences as I was writing, many of which I retained or lightly edited for the final version. While the quality of its suggestions is highly variable, it can sometimes display an uncanny level of simulated understanding of the intent of the text."
Tao is one of the few mathematicians who is constantly singing the praises of specifically ChatGPT and now CoPilot. He does not appear to have any issues that his thought processes are logged by servers owned by a multi-billion dollar company.
He never mentions copyright or social issues. He never mentions results other than "writing LaTeX is easier".
> He does not appear to have any issues that his thought processes are logged by servers owned by a multi-billion dollar company.
His job is publishing his thoughts. They're not going to a single company, but everyone. If he gets results faster, we all get to see his thought process faster.
Ideally, chatgpt would be familiar with everything coming from researchers like him.
> constantly singing the praises of specifically ChatGPT and now CoPilot
Chatgpt is mentioned just once and only as a "such as" example.
Something I've noted about all advanced tools is that the inept fear them, the capable use them, and the elite embrace them wholeheartedly.
Everything from IDEs to Google search gets the same treatment.
I remember a colleague watching me edit code exclaiming that I was "Cheating!" because I had syntax highlighting and tab-completion.
Another coworker who had just failed to get answers after searching for "My PC crashed, how to fix?" kept telling me that the results "couldn't be trusted". He was reading Windows XP bug reports over a decade out-of-date for troubleshooting a Windows Server 2022 issue that manifests only on Azure.
Some people are afraid of these things and suspect there's a hidden agenda, others see things for what they are: Just tools, each fit for a particular purpose.
Is this phobia angle the new talking point? "If you fear our tool for a hefty subscription price while we are logging all your data, you are inept?"
My experience is exactly the opposite: Inept power users jump on the latest bandwagon to camouflage their incompetence. And like true power users they evangelize their latest toy whenever they can.
> Some people are afraid of these things and suspect there's a hidden agenda, others see things for what they are: Just tools, each fit for a particular purpose.
Well, there's also the arms-race scenario. A classic example is computer security: people only make computers more secure because people hack them, which makes hackers improve their stuff. This prisoner's dilemma scenario gives a deterministic force to technology that certainly makes it more than just a tool: it is a force that acts upon society that transcends human choice.
At this stage Tao should disclose whether he has any financial relationship with OpenAI, including stock ownership or even access to experimental models or more computational power than the average user.
I've never seen any academic hyping up a company like that, unless they explicitly have/had a financial relationship like Scott Aaronson.
Tenured professors are at no risk of losing jobs and have minimum business interests. The literal smartest person in the world will be the last person to lose his job anyway.
AI anxiety comes from a fear that AI will replace us, individuals or corporate entities alike. Tao is immune to these risks.
I did not read the pdf, but I think that LLM and Lean could be useful tools for mathematiceans to prove or refute theorems, but the creative idea that sparks knowledge and new theorems lies in the human, the others are tools that can help to reduce time and effort needed and so, indirectly, they can foster and enhance creativity. It also could mitigate some reasoning that require mechanical prove of many details. Anyway, what I just said seems simple and clear, and in no way would it be worth to be published in a high ranking math journal.
I apologize in that case, can you give a brief summary of what is the most important point of that pdf?, I don't know why but my gut feeling is to refuse to read something just based on people reputation. I know that Tao is a very bright mathematician but I also think that he doesn't know much about computer science or computer languages (my evidence is very slim here: once I noted that Tao was very happy with a very simple program, a trivial one, so I inferred from that he still has to learn a lot about programming). For now, it seems clear that the best he can do (for him and us) is to devote his time to math that is his best skill. But it could happen that he could use his best world IQ and math skills to learn how to use LLMs and Lean in a never seem before way to obtain something really valuable.
Today I don't think there is any evidence that such thing is going to happen. On one hand, in general, intelligent people are the first to learn how to use new tools in new or better ways, tools that are useful for what they are good at and are devote at. On the other hand, following that path detracts energy from the core of math that requires intuition and creativity and not so much mechanical proofs. On a third hand, there is always the money question that we can not see, that is because is in the third hand.
On HN here, we've spent the last few years talking and thinking a lot about LLMs, so the paper might not include much that would be surprising to math-curious HN'ers. However, there is a large cohort of research mathematicians out there that likely doesn't know much about modern AI; Terence is saying there's a little utility in last-gen models (GPT-4), and he expects a lot of utility out of combining next-gen models with Lean.
Again, not surprising if you read his blog, but I think publishing a full paper in AMS is a pretty important moment.