Hacker News new | ask | show | jobs
by gmt2027 941 days ago
We have an algorithm and computational hardware that will tune a universal function approximator to fit any dataset with emergent intelligence as it discovers abstractions, patterns, features and hierarchies.

So far, we have not yet found hard limits that cannot be overcome by scaling the number of model parameters, increasing the size and quality of training data or, very infrequently, adopting a new architecture.

The number of model parameters required to achieve a defined level of intelligence is a function of the architecture and training data. The important question is, what is N, the number of model parameters at which we cross an intelligence threshold and it becomes theoretically possible to solve mathematics problems at a research level for an optimal architecture that we may not yet have discovered. Our understanding does not extend to the level where we can predict N but I doubt that anyone still believes that it is infinity after seeing what GPT4 can do.

This claim here is essentially a discovery that N may be much closer to where we are with today's largest models. Researchers at the absolute frontier are more likely to be able to gauge how close they are to a breakthrough of that magnitude from how quickly they are blowing past less impressive milestones like grade school math.

My intuition is that we are in a suboptimal part of the search space and it is theoretically possible to achieve GPT4 level intelligence with a model that is orders of magnitude smaller. This could happen when we figure out how to separate the reasoning from the factual knowledge encoded in the model.

1 comments

intelligence isn't a function unless you're talking about over every possible state of the universe.
There are well described links between intelligence and information theory. Intelligence is connected to prediction and compression as measures of understanding.

Intelligence has nothing specific to do with The Universe as we known it. Any universe will do, a simulation, images or a set of possible tokens. The universe is every possible input. The training set is a sampling drawn from the universe. LLMs compress this sampling and learn the processes and patterns behind it so well that they can predict what should come next without any direct experience of our world.

All machine learning models and neural networks are pure functions. Arguing that no function can have intelligence as a property is equivalent to claiming that artificial intelligence is impossible.

Intelligence must inherently be a function unless there is a third form of cause-effect transition that can't be modelled as a function of determinism and randomness.
Functions are by definition not random. Randomness would break: "In mathematics, a function from a set X to a set Y assigns to each element of X exactly one element of Y"
"Function" has (at least) two meanings. The last clause is not talking about functions in the mathematical sense. It could have been worded clearer, sure.