| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gjdjcjdnxnvjd 3572 days ago
	So does this mean that the era of automated theorem proving and AI-driven mathematics is finally upon us? I'm no expert, but this seems pretty groundbreaking.

5 comments

tnecniv 3572 days ago

My understanding from the article (while I would like to read the paper, I don't have time at the moment) is that it assigns probabilities that a conjecture is correct and improves these estimates over time. As such, something will only really be proven true when the probability hits 1. The summary says that this will occur in the limit, but that might take as long as proving things the traditional way and mathematicians don't like things that are probably true but not proven.

That said, I can think of a number of uses for such an algorithm. If you load it full of conjectures in your field that are known to be true, it will might help you hone what problems are worth exploring by providing guess at how likely it is you can prove a statement you are pondering.

jey 3571 days ago

This algorithm is not practical to implement using any computer that would fit on Earth. It's primarily of theoretical interest.

catnaroek 3571 days ago

> something will only really be proven true when the probability hits 1

Careful. An event can have probability 1 even if its complement isn't empty: https://en.wikipedia.org/wiki/Almost_surely

joshcohen 3571 days ago

In this setting we're using discrete probabilities so we don't have to worry:)

tnecniv 3571 days ago

Good catch. The nuances of the continuum are many and subtle.

JohnStrange 3571 days ago

It's a pretty cool idea but nothing groundbreaking. There is plenty of AI-driven mathematics already, e.g. the Mizar theorem prover. Pollock used to advocate the idea of continuously trying to prove theorems that might be useful later. It's an underrated idea, in my opinion, and his system of defeasible reasoning based on graph theory is still one of the best.

What's new in this paper is the probabilistic component, trying to guess the outcome of complicated proofs. That's a neat idea, but nothing revolutionary. It may give rise to nice shortcuts for better efficiency.

The real problem is making the machine get a good hunch what to prove, so it doesn't find useful theorems just randomly. I'm not working in this field, so correct me if I'm wrong, but that seems to be rather hard. In any case, as far as I know most automated theorem provers are only semi-automatic, you have to give them an idea about which direction to go and which proof strategy to use.

chriswarbo 3571 days ago

> The real problem is making the machine get a good hunch what to prove, so it doesn't find useful theorems just randomly. I'm not working in this field, so correct me if I'm wrong, but that seems to be rather hard. In any case, as far as I know most automated theorem provers are only semi-automatic, you have to give them an idea about which direction to go and which proof strategy to use

I'm working in exactly this area, and it's very nice to see it mentioned occasionally as a useful direction!

There's a bunch of nice work being done on this problem; I'm mostly familiar with (roughly chronologically) IsaScheme, IsaCoSy, QuickSpec, Hipspec and Hipster. These take in a bunch of function definitions and output equations about them; they work by enumerating (type-correct) terms and using random testing (QuickCheck) to quickly separate unequal terms from each other, then they apply automated theorem provers to the remainder.

There are also more first-order, less computationally-focused systems for generating theorems out there, like HR and Graffiti.

tom_mellior 3571 days ago

> AI-driven mathematics already, e.g. the Mizar theorem prover

Can you tell us more about how Mizar is AI-driven? I have never worked with it, but my understanding was that it was a fairly normal proof assistant. That is, proofs are written by humans, and some smallish boring intermediate steps are done using a regular first-order prover. Like with Coq or Isabelle.

Does Mizar do something else as well? Does it use AI to make conjectures?

JohnStrange 3571 days ago

Not really, I know people who work on Mizar but I'm not myself involved. Anyway it's also subjective what you call AI-driven and what not.

I can comment on another thing, though. Even very simple concepts like well-foundedness conditions go beyond first-order logic and these provers are based on pretty expressive higher-order type systems. AFAIK, they can prove fairly substantial theorems.

tom_mellior 3571 days ago

I didn't say that the proof assistants were first order! I'm well aware that that's not the case. But AFAIK all the automation they use is first order. Higher order proof steps like "do this by induction on n" are always introduced by hand (at least in Coq and Isabelle), even if the rest of the proof is automatic.

JohnStrange 3571 days ago

I don't understand what you mean by "all the automation they do is first order". They are based on higher order proof theories, e.g. higher order tablaux, and they implement higher order unification. Otherwise they would be first-order provers and thus much more limited.

But Yes, most common higher-order provers are semi-automatic, you need to give them a hint about which proof strategy to use. That's mainly because they are used that way, not any principal limitation. You won't find many mathematicians who are interested in a theorem prover to spit out some (alleged) theorem by itself, and then let the mathematician check whether it's useful.

The only fully automatic higher-order theorem prover that I know of is ETPS, it will select proof strategies by itself if you don't indicate them. But it's also one of the oldest and slowest and mainly just used for teaching logic.

tom_mellior 3571 days ago

> I don't understand what you mean by "all the automation they do is first order".

I meant that the tactics of Coq that do reasoning for you, and the internal/external provers of Isabelle, are first order. I was probably partly wrong: you are right that some of them do use higher-order unification. But when in Coq I use "auto" or "omega" or whatever tactic to solve a goal, no higher-order tableaux are in use as far as I know. I have to massage the goal until I get it into a form that is palatable to the first-order automatic provers. Similarly, when I write an Isabelle proof like "from A have B by X; from this have C by Y; hence D by Z", the proof methods X, Y, Z are first order, often off-the-shelf SMT provers. Alternatively, there are also some built-in methods that use simple equational reasoning with higher-order unification, yes.

Let me know if I'm wrong about the details of this! Anyway, none of this means that you cannot prove complex higher-order stuff in these systems. You just can't do it automatically.

And, coming back to the start of this subthread, I don't think Mizar is really different in this regard.

chriswarbo 3571 days ago

Keep in mind that theorem proving was one of the very first applications of AI (e.g. Newell and Simon's "Logic Theorist" in the 1950s), so I would take any excitement with as much salt as any other AI claim ;)

There are a few hurdles to overcome before computer/AI-assisted mathematics really 'takes off', for example:

Almost all mathematics is aimed at a human reader; arguments are written in prose, and formula markup only exists to guide the appearance when rendered, i.e. LaTeX; just like HTML, it's technically all marked up and machine readable, but the semantic information we can extract is very low.

Whilst OCR, etc. will keep progressing, I think the real solution is to have people (or their tools) place semantics first and rendering second, e.g. with formats like OpenMath; to do this, we need to provide compelling reasons, e.g. automated assistance, inclusion in repositories, automated citations for those who use your results, etc.

Another problem is that there are many incompatible systems; if some result is formalised in a different system to the one you're using, your best option is to either switch system or attempt to re-prove it yourself. There are ongoing efforts to provide a more abstract overlay, so that results from one system can be re-used in another (providing their logics are somehow compatible), e.g. https://kwarc.info/projects

Another is how low-level automated reasoning currently is; even something which looks like a pretty clear instruction, like a step which says "by induction", involves such a huge search space that existing algorithms blow up. Working mathematicians, quite rightly, get fed up of the tedium of spelling out each individual step in such excruciating detail. It's just like with software, but imagine that you've spent your career working with a super fast Prolog system with a well-organised standard library built up over a thousand years, and you're then asked to program machine code by flipping switches on a slow machine with no existing software ;)

joshcohen 3571 days ago

It's not clear if there are any practical applications to automated theorem proving (and that's not the goal of the work). This is primarily motivated by very "theoretical" decision theory.

eli_gottlieb 3571 days ago

How do you manage to claim that your computable logical inductor dominates the incomputable universal semimeasure?

tbt 3571 days ago

The universal semimeasure is "computably approximable from below", aka (lower) semicomputable, meaning that you can computably list out all the rational numbers below a given value assigned by the measure. The probabilities assigned by a logical inductor are computably approximable reals, which (confusingly) is weaker than lower semicomputable; it just means that you can compute a sequence of rationals that converges to the real, with a possibly uncomputable convergence rate.

joshcohen 3571 days ago

I'm not an author but this property with respect to a universal 'semimeasure' holds in the limit as time goes to infinity, as do most stated properties of logical inductors. The fact that they're computable is academic.

tbt 3571 days ago

This is right, but perhaps misleading; most of the properties are "asymptotic", meaning that they may take an extremely long time to hold, but they hold at finite times. For example, "provability induction" says that if you have a (polytime computable) sequence of sentences phi_n, all of which happen to be provable (possibly with fast-growing proof lengths), then P_n(phi_n) limits to 1. This means that on day n the logical inductor P is confident of phi_n, even though phi_n may take much longer than n steps to prove.

chriswarbo 3571 days ago

The key is that the generator of such sequences has limited resources; once the inductor has learned the (implicit or explicit) program generating these sentences (perhaps by simulation, as in Solomonoff induction), it can apply that reasoning to the individual sentences; e.g. if it believes it's verified the correctness of the generator, that is enough justification to believe the sentences being produced.

tbt 3571 days ago

Yep, that's the idea :)

This is in the vein of "prediction using ensembles of experts" methods such as SI, with a twist that the experts are traders, not forecasters; they don't have to have opinions on everything the logical inductor has to predict, the traders just have to point out particular ways that the logical inductor is being silly (and then the logical inductor corrects those problems).

joshcohen 3571 days ago

Although P_n can't be computed in anything like n steps.

tbt 3571 days ago

Yeah. It's still surprising to me, though; P_n can predict extremely long-running computations, even ones with a much longer runtime than P_n, at least as well as any quickly computable "pattern". (The algorithm in the paper uses a (roughly) double-exponential-time algorithm to predict arbitrarily long-running programs, in a way that can't be improved upon by any polytime computable method.)

bbctol 3571 days ago

This is MIRI; they do really interesting theoretical philosophy and worryingly small amounts of practical research.

Jach 3571 days ago

At least public practical research. But I've long wondered what their bank account would look like if they had a side venture into exploiting narrow AI for commercial use rather than rely on donations.

bonoboTP 3571 days ago

Maybe they have a problem attracting those types of researchers for some reason.

jey 3571 days ago

What is your definition of practical?

jpt4 3571 days ago

MIRI is entirely Blue Team - they work to create theoretical [0] safeguards on AI agents yet to be developed. I've long envisioned a counterpart Red Team that does nothing but build AIs that attempt to subvert these safety features, since the rest of the world of non Friendly AI [1] research is only unco-ordinated para-red behavior.

[0] In the sense of "valid under these known precepts", not "speculative".

[1] Non "Friendly AI", not "non-Friendly" AI.

gallabytes 3571 days ago

I find that perception fairly surprising, as for a very long time it felt like we did more red team than blue team. I do acknowledge that this has been changing recently, but only significantly in the context of building on the results in this paper.

jpt4 3571 days ago

Would you please direct me to an example of MIRI's Red Team efforts that isn't the recent "Malevolent AI" paper [0]? Adherence to the belief that UFAI is a threshold-grade existential risk seems to compel a "define first, delay implementation" strategy, lest any step forward be the irrevocable wrong one.

[0] http://arxiv.org/abs/1605.02817

gallabytes 3568 days ago

This makes me think the only thing we disagree on is the meaning of the words "red team" and "blue team" :)

When I say it feels like we spend a lot of time red teaming, that means I think we spend somewhere between 30 and 60% of research time trying to break things and see how they fail. This is fully compatible with not immediately implementing things - it's much less expensive to break something /before/ you build it.

bonoboTP 3571 days ago

My concept of practical is stuff like: implementing things, doing experiments with prototypes, producing stuff that is imperfect but does something and can be improved upon.

Theoretical stuff is like: proving theorems, conceptualizing the task at hand, philosophical inquiry into the nature of agents/intelligence/reasoning/goals/human values

I'm not trying to argue which is more important, but surely MIRI focuses more on the theoretical.