Hacker News new | ask | show | jobs
by spi 938 days ago
A decent start? It says absolutely nothing about how to solve it, except repeating the question. The part where it tries out the few first numbers is entirely wrong, given that 6 does _not_ satisfy the condition (2 does not divide 3+6=9) and 8 _does_ (2 does divide 4+8=12).

Amusingly, the list of the integers <= 100 satisfying the property is correct, and it contradicts itself from the previous paragraph. Maybe if GPT wasn't a one-directional autoregressive model but allowed itself to go back and edit the past, it would have caught up that discrepancy and fixed it - but no such architecture currently exists that would run in decent amount of time.

Given that GPT4 is not a model but a full product, behind the scenes it probably coded up and ran a small python code that translated the problem into code, executed it and got its solution for the first few integers. Which would be a good thing to do to start solving a problem like this, except you're not allowed to do that at the IMO, obviously.

Looking at the output of this program, it suggests that powers of primes could be a class of solution (or maybe even the only solutions? I guess that's all the problem was _really_ asking to prove, but having never qualified for the IMO myself, I can't be sure). In fact, for n = p^k, the divisors are [1, p, p^2, ..., p^{k-1}, p^k], and clearly always p^i divides p^{i+1} + p^{i+2} = p^i (p + p^2). I guess this small remark would have gained me a point at the IMO, only 41 to go ;-)

But the other side of the coin is that having those numbers written down in front of you and not even making a conjecture about powers of prime being the answer would really denote poor mathematical reasoning by GPT4. It's only really proving that it can understand what it's being asked, which, I admit, places it in a better position than maybe 90% of the human population, but unfortunately for GPT4 mathematics is the least democratic science of them all - it's always only the top-1 result that matters in the end.

P.S. Being a former mathematician currently working on deep learning, having (or building!) a model that can solve mathematical questions has always been my dream. I'm not even talking about something that can _prove_ things, even just that can understand and rephrase them in different settings (which in mathematics is very, very hard, even for a human). Or spot weaknesses in already stated down proofs. As a graduate student, having something I could chat about to ask silly question while studying a paper would have been a real game changer. Even for best-of-world professionals it would be useful: when the wrong proof about the ABC conjecture came out, it took months of work from the best minds of our world to read through it and disprove it. If Mochizuki had had some tool for automatically checking his proof (and a smaller ego, I guess) he could have caught that early on, saved everybody a lot of work and the whole world some useless drama.

And while we're closer than ever to reaching that, I think GPT4 is still quite a far way from it. But with the pace we've seen recently in AI evolution, who knows...

2 comments

(I'm also replying myself concerning the problem itself).

Unless I'm getting myself completely wrong, this also seems to be a very unusually simple problem for IMO's standards. I don't think I ever got myself solving one of them when I tried in the past, but if I'm not fooling myself the most natural approach for this one seems to bring to the solution:

1) Look at the two smallest divisors 1 < d_2 < d_3 of n. Then d_2 is necessarily a prime p, and d_3 is either p^2 or a different prime q. Let's first prove the latter can't be the case: if it where, looking at the biggest 3 divisors [n/q, n/p, n], we'd get that there's an integer a such that: n/q * a = n/p + n. Simplifying a bit, we get (1+p)q = ap, which is impossible because p divides neither p+1 nor q.

2) So, for n not to be a power of p, it must be 1 < d_2=p < d_3=p^2 < ... d_{k+1}=p^k < q. In particular, p^{k-1} must divide p^k+q. However this is also impossible, because p^{k-1} obviously divides p^k, but not q, so it can't divide the sum.

Gosh I feel like a grumpy old man saying "those youngsters, on my days we used to have harder problems than this" ;-)

haha yeah seems like there's like only three things to do.

1)notice it's prime powers

2)notice that 1,p,p^2 eventually leads to contradiction

3)use the last part of the factorization to rule out 1,p,q

I couldn't get GPT4 to do either 2 or 3 even with hints. It's surprising that either due to context length or something else I feel like its reasoning abilities are worse when you try to guide it. But maybe this is true for humans too.

https://chat.openai.com/share/ad42d09a-b366-4b21-b701-782fc8.... Will you re-evaluate how far GPT4 is from it from the fact that it can guess it's powers of primes?
Good to know it can do that, in the pasted chat above it didn't. To be honest, it surprised me it couldn't, this isn't exactly a very hard guess given the computation results. It doesn't convince me GPT4 is anywhere close to winning the IMO, though :-)
True. I let it continue and even prompted it with hints but it cannot find the proof that non-prime powers fail. It can however prove prime powers work.
Do we really know for sure that GPT4 has not seen this problem already?
It's a good point. The knowledge cut-off says Jan 2022, but in the past there's been reasons to suspect some data after the cut-off is in the model too. There's also the possibility that it has seen the problem somewhere else before Jan 2022.

Nevertheless, I think the more important question is whether GPT-4 is capable of

1) Listing all solutions less than 100

2) Figuring out the commonalities of the solutions

For 1) I have no doubt that the answer is yes based on its coding skills, in fact it is much stronger at coding than this. For 2) my subjective feeling in playing around with it is it's not consistent at similar problems but it can do it sometimes. Maybe in this case it has seen the list of prime powers <100 so it's very easy for it.