Hacker News new | ask | show | jobs
by unblough 971 days ago
> It's surprising because it wasn't the intent of LLMs. LLMs are just predictive models that guess the most likely next word. Having the results make sense was never a priority.

If you took the same amount of data for the GPT3+ but scrambled it's tokenization before training THEN I would agree with you that its current behaviour is surprising, but the model was fed data that has large swaths that are literal question and answer constructions. It's over fitting behavior is largely why it's parent company is facing so much legal backlash.

> Even more mind boggling is the fact that randomness is part of its algorithm

The randomness is for token choice rather than any training time tunable so fails to support the "i.e. we don’t really know what happened" sentiment. We do know, we told it to flip a coin, and it did.

> i.e. temperature, and that without it the output is kind of meh.

Both without it and with it. You can turn up the temperature and get bad results as well as you can turn it down and get bad results.

If adding a single additional dimension to the polynomial of the solution space turned a nondeterministic problem into a deterministic one, then yes, I would agree with you, that would be surprising.

2 comments

> so fails to support the "i.e. we don’t really know what happened" sentiment

It's less that we don't know what's happening on a micro-level but more that it's surprising that it's producing anything coherent at all on a macro-level - especially with a (necessary) element of randomness in the process.

For most part we don't seem particularly knowledgeable about what happens on a macro-level. Hallucinations remain an unsolved problem. AI companies can't even make their "guardrails" bulletproof.

If you believe LLMs are fully explainable you should write a paper and submit to arxiv.
I think this is an uncharitable reading of this thread.

I’m arguing against the breathless use of “surprising”.

My gp explains what I think you overlooked in this dismissive response.

> to analyze its decisions computationally necessitates similar levels of computation for each decision being made as what was used to compute the weights.

Explainable but intractable is still far from surprising for me.

> It's surprising because it wasn't the intent of LLMs. LLMs are just predictive models that guess the most likely next word. Having the results make sense was never a priority.

If you read through what Hinton or any of his famous students have said, it genuinely was and is surprising. Everything from AlexNet to the jump between GPT-2 to GPT-3 was surprising. We can't actually explain that jump in a formal way, just reasonable guesses. If something is unexplainable, it's unpredictable. Prediction without understanding is a vague guess and the results will come as a surprise.