Hacker News new | ask | show | jobs
by benchmarkist 584 days ago
It will be a useful benchmark to validate claims by people like Sam Altman about having achieved AGI.
2 comments

Most humans can't solve these problems, so it's certainly possible to imagine a legitimate AGI that can't either.
But humans can solve these problems given enough time and domain knowledge. An LLM would never be able to solve them unless they get smarter. Thats the point.

It’s not about whether a random human can solve them. It’s whether AI, in general, can. Humans, in general, have proven to be able to solve them already.

I'm responding to this:

> It will be a useful benchmark to validate claims by people like Sam Altman about having achieved AGI.

I think it is possible to achieve AGI without creating an AGI that is an expert mathematician, and that it is possible to create a system that can do FrontierMath without achieving AGI. I.e. I think failure or success at FrontierMath is orthogonal to achieving AGI (though success at it may be a step on the way). Some humans can do it, and some AGIs could do it, but people and AI systems can have human-level intelligence without being able to do it. OTOH I think it would be hard to claim you have ASI if it can't do FrontierMath.

I think people just see FrontierMath as a goal post that an AGI needs to hit. The term "artificial general intelligence" implies that it can solve any problem a human can. If it can't solve math problems that an expert human can, then it's not AGI by definition.

I think we have to keep in mind that humans have specialized. Some do law. Some do math. Some are experts at farming. Some are experts at dance history. It's not the average AI vs the average human. It's the best AI vs the best humans at one particular task.

The point with FrontierMath is that we can summon at least one human in the world who can solve each problem. No AI can in 2024

Okay, sounds like different definitions.

If you have a single system that can solve any problem any human can, I'd call that ASI, as it's way smarter than any human. It's an extremely high bar, and before we reach it I think we'll have very intelligent systems that can do more than most humans, so it seems strange not to call those AGIs (they would meet the definition of AGI on Wikipedia [1]).

[1] https://en.wikipedia.org/wiki/Artificial_general_intelligenc...

>If you have a single system that can solve any problem any human can, I'd call that ASI

I don't think that's the popular definition.

AGI = solve any problem any human can. In this case, we've not reached AGI since it can't solve most FrontierMath problems.

ASI = intelligence far surpasses even the smartest humans.

If the definition of AGI has is that it's more intelligent than the average human, you can argue that we already have AGI today. But no one thinks we have AGI today. Therefore, AGI is not Claude 3.5.

Hence, I think the most acceptable definition for AGI is that it can solve any problem any human can.

The reason for the AGI definition is to indicate a point where no human can provide more value than the AGI can. AGI should be able to replace all work efforts on its own, as long as it can scale.

ASI is when it is able to develop a much better version of itself to then iteratively go past all of that.

It is very much an open question just what an llm can solve when allowed to generate an indefinite number of intermediate tokens and allowed to sample an arbitrary amount of text to ground itself.

There are currently no tools that let llms do this and no one is building the tools for answering open ended questions.

That's correct. Thanks for clarifying for me because I have gotten tired with the comparison to "99% of humans can't do this" as a counter-argument to AI hype criticism.
AGI should be able to do anything the best humans can do. ASI is when it does everything better than the best humans.
Those thresholds look the same to me, personally.

An AI that can be onboarded to a random white collar job, and be interchangeably integrated into organisations, surely is AGI for all practical purposes, without eliminating the value of 100% of human experts.

If an AI achieved 100% in this benchmark it would indicate super-intelligence in the field of mathematics. But depending on what else it could do it may fall short on general intelligence across all domains.