Hacker News new | ask | show | jobs
by somenameforme 17 days ago
It poses a simple problem. Take humanity back not that long ago into the past and language didn't even exist - our expressed token base was practically 0. We went from that discovering the secrets of the atom, putting a man on the Moon, and more. If you put an LLM in that starting point, they're going to do nothing but endlessly cycle over basically nothing. If you give them an infinite amount of time and processing, that wouldn't change.

This same issue simultaneously demonstrates how humans are not anything at all like token predictors. No matter how much time you spend remixing the tokens of primitive man, you don't get 'and here is how you land on the Moon' from it.

2 comments

Token is not a clump of letters. It's a multidimensional initial input vector that gets tweaked and transformed. GPT doesn't think in tokens. It just accepts them as input (although it happily accepts any other vectors in-between the vectors that represent tokens and finding best prompt for a given task not as tokens but as input vectors is a legitimate prompt optimization strategy).

It also outputs vectors that are coerced into tokens for human consumption.

Yes, it goes through tokens but possible internal meanings assigned to these tokens (when surrounded by other tokens) are infinite.

That's how humans form caves got to where we are now. By associating new meanings with the same old sound clumps.

> If you give them an infinite amount of time and processing, that wouldn't change.

Hrm I doubt it actually. Llms are capable of discovery, as recent math news showed. This means a "society" of Llms could likely have progress.

Only by having the LLM random walk the hypothesis space with a validator rejecting invalid ones.

The reason why LLM hypotheses are any good is because it already consumed a civilization worth of knowledge. You couldn't have bootstrapped such system with nothing but a few priors/axioms and let it discover the universe.

Well yes, LLM need rich and favorable substrate to grow and learn (or we might say bootstrap)

As well as DNA needs specific substrate (cell with ribosomes and other machinery). As well as humans (one need oxygen atmosphere, food, parents).

But in the world we live in existence of favorable substrate for humans or LLMs is a given thing. It is _already_ bootstrapped. Can we infer something about LLM limits or possibility of it achieving AGI from its bootstrapping requirements?

Only through direction from a mind.