So does this mean that the era of automated theorem proving and AI-driven mathematics is finally upon us? I'm no expert, but this seems pretty groundbreaking.
My understanding from the article (while I would like to read the paper, I don't have time at the moment) is that it assigns probabilities that a conjecture is correct and improves these estimates over time. As such, something will only really be proven true when the probability hits 1. The summary says that this will occur in the limit, but that might take as long as proving things the traditional way and mathematicians don't like things that are probably true but not proven.
That said, I can think of a number of uses for such an algorithm. If you load it full of conjectures in your field that are known to be true, it will might help you hone what problems are worth exploring by providing guess at how likely it is you can prove a statement you are pondering.
It's a pretty cool idea but nothing groundbreaking. There is plenty of AI-driven mathematics already, e.g. the Mizar theorem prover. Pollock used to advocate the idea of continuously trying to prove theorems that might be useful later. It's an underrated idea, in my opinion, and his system of defeasible reasoning based on graph theory is still one of the best.
What's new in this paper is the probabilistic component, trying to guess the outcome of complicated proofs. That's a neat idea, but nothing revolutionary. It may give rise to nice shortcuts for better efficiency.
The real problem is making the machine get a good hunch what to prove, so it doesn't find useful theorems just randomly. I'm not working in this field, so correct me if I'm wrong, but that seems to be rather hard. In any case, as far as I know most automated theorem provers are only semi-automatic, you have to give them an idea about which direction to go and which proof strategy to use.
> The real problem is making the machine get a good hunch what to prove, so it doesn't find useful theorems just randomly. I'm not working in this field, so correct me if I'm wrong, but that seems to be rather hard. In any case, as far as I know most automated theorem provers are only semi-automatic, you have to give them an idea about which direction to go and which proof strategy to use
I'm working in exactly this area, and it's very nice to see it mentioned occasionally as a useful direction!
There's a bunch of nice work being done on this problem; I'm mostly familiar with (roughly chronologically) IsaScheme, IsaCoSy, QuickSpec, Hipspec and Hipster. These take in a bunch of function definitions and output equations about them; they work by enumerating (type-correct) terms and using random testing (QuickCheck) to quickly separate unequal terms from each other, then they apply automated theorem provers to the remainder.
There are also more first-order, less computationally-focused systems for generating theorems out there, like HR and Graffiti.
> AI-driven mathematics already, e.g. the Mizar theorem prover
Can you tell us more about how Mizar is AI-driven? I have never worked with it, but my understanding was that it was a fairly normal proof assistant. That is, proofs are written by humans, and some smallish boring intermediate steps are done using a regular first-order prover. Like with Coq or Isabelle.
Does Mizar do something else as well? Does it use AI to make conjectures?
Not really, I know people who work on Mizar but I'm not myself involved. Anyway it's also subjective what you call AI-driven and what not.
I can comment on another thing, though. Even very simple concepts like well-foundedness conditions go beyond first-order logic and these provers are based on pretty expressive higher-order type systems. AFAIK, they can prove fairly substantial theorems.
I didn't say that the proof assistants were first order! I'm well aware that that's not the case. But AFAIK all the automation they use is first order. Higher order proof steps like "do this by induction on n" are always introduced by hand (at least in Coq and Isabelle), even if the rest of the proof is automatic.
I don't understand what you mean by "all the automation they do is first order". They are based on higher order proof theories, e.g. higher order tablaux, and they implement higher order unification. Otherwise they would be first-order provers and thus much more limited.
But Yes, most common higher-order provers are semi-automatic, you need to give them a hint about which proof strategy to use. That's mainly because they are used that way, not any principal limitation. You won't find many mathematicians who are interested in a theorem prover to spit out some (alleged) theorem by itself, and then let the mathematician check whether it's useful.
The only fully automatic higher-order theorem prover that I know of is ETPS, it will select proof strategies by itself if you don't indicate them. But it's also one of the oldest and slowest and mainly just used for teaching logic.
> I don't understand what you mean by "all the automation they do is first order".
I meant that the tactics of Coq that do reasoning for you, and the internal/external provers of Isabelle, are first order. I was probably partly wrong: you are right that some of them do use higher-order unification. But when in Coq I use "auto" or "omega" or whatever tactic to solve a goal, no higher-order tableaux are in use as far as I know. I have to massage the goal until I get it into a form that is palatable to the first-order automatic provers. Similarly, when I write an Isabelle proof like "from A have B by X; from this have C by Y; hence D by Z", the proof methods X, Y, Z are first order, often off-the-shelf SMT provers. Alternatively, there are also some built-in methods that use simple equational reasoning with higher-order unification, yes.
Let me know if I'm wrong about the details of this! Anyway, none of this means that you cannot prove complex higher-order stuff in these systems. You just can't do it automatically.
And, coming back to the start of this subthread, I don't think Mizar is really different in this regard.
Keep in mind that theorem proving was one of the very first applications of AI (e.g. Newell and Simon's "Logic Theorist" in the 1950s), so I would take any excitement with as much salt as any other AI claim ;)
There are a few hurdles to overcome before computer/AI-assisted mathematics really 'takes off', for example:
Almost all mathematics is aimed at a human reader; arguments are written in prose, and formula markup only exists to guide the appearance when rendered, i.e. LaTeX; just like HTML, it's technically all marked up and machine readable, but the semantic information we can extract is very low.
Whilst OCR, etc. will keep progressing, I think the real solution is to have people (or their tools) place semantics first and rendering second, e.g. with formats like OpenMath; to do this, we need to provide compelling reasons, e.g. automated assistance, inclusion in repositories, automated citations for those who use your results, etc.
Another problem is that there are many incompatible systems; if some result is formalised in a different system to the one you're using, your best option is to either switch system or attempt to re-prove it yourself. There are ongoing efforts to provide a more abstract overlay, so that results from one system can be re-used in another (providing their logics are somehow compatible), e.g. https://kwarc.info/projects
Another is how low-level automated reasoning currently is; even something which looks like a pretty clear instruction, like a step which says "by induction", involves such a huge search space that existing algorithms blow up. Working mathematicians, quite rightly, get fed up of the tedium of spelling out each individual step in such excruciating detail. It's just like with software, but imagine that you've spent your career working with a super fast Prolog system with a well-organised standard library built up over a thousand years, and you're then asked to program machine code by flipping switches on a slow machine with no existing software ;)
It's not clear if there are any practical applications to automated theorem proving (and that's not the goal of the work). This is primarily motivated by very "theoretical" decision theory.
The universal semimeasure is "computably approximable from below", aka (lower) semicomputable, meaning that you can computably list out all the rational numbers below a given value assigned by the measure. The probabilities assigned by a logical inductor are computably approximable reals, which (confusingly) is weaker than lower semicomputable; it just means that you can compute a sequence of rationals that converges to the real, with a possibly uncomputable convergence rate.
I'm not an author but this property with respect to a universal 'semimeasure' holds in the limit as time goes to infinity, as do most stated properties of logical inductors. The fact that they're computable is academic.
This is right, but perhaps misleading; most of the properties are "asymptotic", meaning that they may take an extremely long time to hold, but they hold at finite times. For example, "provability induction" says that if you have a (polytime computable) sequence of sentences phi_n, all of which happen to be provable (possibly with fast-growing proof lengths), then P_n(phi_n) limits to 1. This means that on day n the logical inductor P is confident of phi_n, even though phi_n may take much longer than n steps to prove.
The key is that the generator of such sequences has limited resources; once the inductor has learned the (implicit or explicit) program generating these sentences (perhaps by simulation, as in Solomonoff induction), it can apply that reasoning to the individual sentences; e.g. if it believes it's verified the correctness of the generator, that is enough justification to believe the sentences being produced.
This is in the vein of "prediction using ensembles of experts" methods such as SI, with a twist that the experts are traders, not forecasters; they don't have to have opinions on everything the logical inductor has to predict, the traders just have to point out particular ways that the logical inductor is being silly (and then the logical inductor corrects those problems).
Yeah. It's still surprising to me, though; P_n can predict extremely long-running computations, even ones with a much longer runtime than P_n, at least as well as any quickly computable "pattern". (The algorithm in the paper uses a (roughly) double-exponential-time algorithm to predict arbitrarily long-running programs, in a way that can't be improved upon by any polytime computable method.)
At least public practical research. But I've long wondered what their bank account would look like if they had a side venture into exploiting narrow AI for commercial use rather than rely on donations.
MIRI is entirely Blue Team - they work to create theoretical [0] safeguards on AI agents yet to be developed. I've long envisioned a counterpart Red Team that does nothing but build AIs that attempt to subvert these safety features, since the rest of the world of non Friendly AI [1] research is only unco-ordinated para-red behavior.
[0] In the sense of "valid under these known precepts", not "speculative".
I find that perception fairly surprising, as for a very long time it felt like we did more red team than blue team. I do acknowledge that this has been changing recently, but only significantly in the context of building on the results in this paper.
Would you please direct me to an example of MIRI's Red Team efforts that isn't the recent "Malevolent AI" paper [0]? Adherence to the belief that UFAI is a threshold-grade existential risk seems to compel a "define first, delay implementation" strategy, lest any step forward be the irrevocable wrong one.
This makes me think the only thing we disagree on is the meaning of the words "red team" and "blue team" :)
When I say it feels like we spend a lot of time red teaming, that means I think we spend somewhere between 30 and 60% of research time trying to break things and see how they fail.
This is fully compatible with not immediately implementing things - it's much less expensive to break something /before/ you build it.
My concept of practical is stuff like: implementing things, doing experiments with prototypes, producing stuff that is imperfect but does something and can be improved upon.
Theoretical stuff is like: proving theorems, conceptualizing the task at hand, philosophical inquiry into the nature of agents/intelligence/reasoning/goals/human values
I'm not trying to argue which is more important, but surely MIRI focuses more on the theoretical.
That said, I can think of a number of uses for such an algorithm. If you load it full of conjectures in your field that are known to be true, it will might help you hone what problems are worth exploring by providing guess at how likely it is you can prove a statement you are pondering.