| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rad_val 17 days ago

The strongest argument for this is structural: what LLMs are.

In a brutal simplistic way: each token is represented in a high dimensional vector. LLMs operate on them. They are the true, underlying meaning of the token for the LLM. Think of it as 1000+ ways to think of that word/token. Those meanings are baked in at training time. So, LLMs might be able to cross-reference them and solve a class of problems that flew under our radar, but can't come up with revolutionary theories that were never in the training set.

Of course, they will help winning a Nobel in the years to come, no doubt, but can't speak mathematics we can't understand (beyond simple obfuscation) and won't discover anything substantial on their own.

3 comments

resident423 17 days ago

> but can't come up with revolutionary theories that were never in the training set.

Can you elaborate? I don't think the solution to the unit distance problem was in the training set, but I'm guessing you mean there's some higher bar for revolutionary theories LLMs cant reach? If so where do you expect the limit will be?

link

redox99 17 days ago

Instead of going into a long technical argument of why your description of LLMs is flawed, I'll go straight to the point, because people keep moving the goal posts.

What exact problem would need to be solved by LLMs to convince you that they DO discover novel solutions?

link

rad_val 17 days ago

I'm more interested why you think my understanding is flawed honestly. I thought I distilled it decently well in two sentences. The bottom line is, in this hyperdimensional space you can find relationships that are not easily distinguished by human minds, but the corpus is still fixed, a llm can't truly know anything beyond its training data.

link

redox99 17 days ago

> Think of it as 1000+ ways to think of that word/token

I assume you used 1000 because that's in the ballpark of the vector size. But these are not independent scalars, like each might store a certain property. Just like in 2D you can have 4 quadrants (or subdivide further), with a vector of size 1000 you can encode an insane amount of meaning.

> Those meanings are baked in at training time. So, LLMs might be able to cross-reference them and solve a class of problems that flew under our radar, but can't come up with revolutionary theories that were never in the training set.

There's a lot of jumping to conclusions here, but I'll try to answer more generally.

This idea of how LLMs work is mostly to build an intuition, like with a CNN you'd say imagine a layer does edge detection, and so on. And to some degree you can detect those kinds of behavior, but a NN is a VERY general architecture. It needn't work like you say, it can calculate any function and running under a loop and a scratchpad (basically an agent) is turing complete.

Even ignoring that, this part is misleading

> Those meanings are baked in at training time.

Being baked in at training time does not mean it didn't build novel meanings at training time.

This is even more significant when you take into account post training RL.

A simple proof that transformers can generate novel, superhuman solutions, is that you can build a transformer based chess bot, feed it 0 human games, and train it with RL until it can beat any human, completely novel and unconstrained by human gameplay (because it would've never seen it).

You can do that with any task that's verifiable, like coding or math.

(Also as a separate fact, as long as a task is easier to verify than solve (basically always), you have somewhat of a million monkeys with a typewriter, and with temperature sampling the model might eventually stumble it's way onto a solution.)

link

dehsge 17 days ago

unify general relativity with quantum mechanics. The continuum hypothesis. The traveling salesman problem in polynomial time.

link

redox99 17 days ago

I think it's cool how in a decade we went from

"Neural networks will never be able to understand this sentence that's obvious to humans"

"LLMs must be able to solve problems that humanity hasn't been able to after almost a century, and that might even be unsolvable"

link

dehsge 16 days ago

So that is kind of the point of studying maths right?

Why something in unsolvable or undecidable can be as important as the output of a theorem.

Questions like these, fields medal level problems or Karp’s 21 NP-complete problem are problems working mathematicians are interested in.

Will LLMs help as an human assistant in the future? Probably.

Will LLMs answer these questions themselves, provide insights and bounds to these new mathematics and teach other mathematicians why this new math they create is true?

Will these models have phds and take candidates teaching them how to apply and think about the maths problems they are interested in?

link

roywiggins 16 days ago

it can operate at the level of a mere mathematics professor, who everyone knows are barely conscious, basically automatons. wake me up when it's Einstein

link

3uruiueijjj 16 days ago

The continuum hypothesis was proven independent of ZFC over sixty years ago, I think even GPT2 could have told you that much.

link

int_19h 17 days ago

I don't see how any of this follow. Yes, the LLMs will learn the "meaning" (here narrowly defined as relative configuration in the embedding space) of vectors that correspond to tokens in whatever tokenizer is used to feed into them. But that vector space is not discrete, and nothing precludes the model from internally operating on other vectors that it never saw in training, based on how they relate to those vectors which it did see.

link