| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lufenialif2 375 days ago

Still no information on the amount of compute needed; would be interested to see a breakdown from Google or OpenAI on what it took to achieve this feat.

Something that was hotly debated in the thread with OpenAI's results:

"We also provided Gemini with access to a curated corpus of high-quality solutions to mathematics problems, and added some general hints and tips on how to approach IMO problems to its instructions."

it seems that the answer to whether or not a general model could perform such a feat is that the models were trained specifically on IMO problems, which is what a number of folks expected.

Doesn't diminish the result, but doesn't seem too different from classical ML techniques if quality of data in = quality of data out.

5 comments

dvh 375 days ago

Ok but when reported by mass media, which never used SI units and instead uses units like libraries of Congress, or elephants, what kind of unit should media use to compare computational energy of ai vs children?

thrance 375 days ago

If the models that got a gold medal are anything like those used on ARC-AGI, then you can bet they wrote an insane amount of text trying to reason their ways through these problems. Like, several bookshelves worth of writings.

So funnily enough, "the AI wrote x times the library of Congress to get there" is good enough of a comparison.

rfurmani 375 days ago

Dollars of compute at market rate is what I'd like to see, to check whether calling this tool would cost $100 or $100,000

gus_massa 375 days ago

4.5 hours × 2 "days", 100 Wats including support system.

I'm not sure how to implement the "no calculator" rule :) but for this kind of problems it's not critical.

Total = 900Wh = 3.24MJ

qnleigh 375 days ago

100 watts seems very low. A single Nvidia GeForce RTX 5090 is rated at ~600 watts. Probably they are using many GPUs/TPUs in parallel.

gus_massa 375 days ago

I forgot to explain in my comment, but my calculation is for humans.

If the computer uses ~600W, let's give it 45+45 minutes and we are even :) If they want to use many GPU ...

lufenialif2 375 days ago

Convert libraries, elephants, etc into SI of course! Otherwise, they aren't really comparable...

dortlick 375 days ago

Kilocalories. A unit of energy that equals 4184 Joules.

gjm11 375 days ago

Human IMO contestants are also trained specifically on IMO problems.

pfortuny 375 days ago

They can train it n “Crux Mathematicorum” and similar journals, which are collections of “interesting” problems and their solutions.

https://cms.math.ca/publications/crux

nicce 375 days ago

Some unofficial comparison with costs of public models (performing worse): https://matharena.ai/imo/

So the real cost is something much more.

vonneumannstan 375 days ago

>it seems that the answer to whether or not a general model could perform such a feat is that the models were trained specifically on IMO problems, which is what a number of folks expected.

Not sure thats exactly what that means. Its already likely the case that these models contained IMO problems and solutions from pretraining. It's possible this means they were present in the system prompt or something similar.

AlotOfReading 375 days ago

Does the IMO reuse problems? My understanding is that new problems are submitted each year and 6 are selected for each competition. The submitted problems are then published after the IMO has concluded. How would the training data contain unpublished, newly submitted problems?

Obviously the training data contained similar problems, because that's what every IMO participant already studies. It seems unlikely that they had access to the same problems though.

AlanYx 375 days ago

IMO doesn't reuse problems, but Terence Tao has a Mastodon post where he explains that the first five (of six) problems are generally ones where existing techniques can be leveraged to get to the answer. The sixth problem requires considerable originality. Notably, both Gemini and OpenAI's model didn't get the sixth problem. Still quite an achievement though.

apayan 375 days ago

Do you have another source for that? I checked his Mastodon feed and don't see any mention about the source of the questions from the IMO.

https://mathstodon.xyz/@tao

Davidzheng 375 days ago

strange statement--it's not true in general for sure (3&6 typically hardest but they certainly aren't fundamentally of a different nature to other questions) this year P6 seemed to be by far the hardest though but this posthoc statement should be read cautiously

vonneumannstan 375 days ago

>How would the training data contain unpublished, newly submitted problems?

I don't think I or op suggested it did.

sottol 375 days ago

Or that they did significant retraining to boost IMO performance creating a more specialized model at the cost of general-purpose performance.