Hacker News new | ask | show | jobs
by jiggawatts 943 days ago
Everything you said about LLMs being "terrible at X" is true of the current generation of LLM architectures.

From the sound of it, this Q* model has a fundamentally different architecture, which will almost certainly make some of those issues not terrible any more.

Most likely, the Q* design is the very similar to the one suggested recently by one of the Google AI teams: doing a tree search instead of greedy next token selection.

Essentially, current-gen LLMs predict a sequence of tokens: A->B->C->D, etc... where the next "E" token depends on {A,B,C,D} and then is "locked in". While we don't know exactly how GPT4 works, reading between the lines of the leaked info it seems that it evaluates 8 or 16 of these sequences in parallel, then picks the best overall sequence. On modern GPUs, small workloads waste the available computer power because of scheduling overheads, so "doing redundant work" is basically free up to a point. This gives GPT4 a "best 1 of 16" output quality improvement.

That's great, but each option is still a linear greedy search individually. Especially for longer outputs the chance of a "mis-step" at some point goes up a lot, and then the AI has no chance to correct itself. All 16 of the alternatives could have a mistake in them, and now its got to choose between 16 mistakes.

It's as if you were trying to write a maths proof, asked 16 students, and instructed them to not cooperate and write their proof left-to-right, top-to-bottom without pausing, editing, or backtracking in any way! It'd like to see how "smart" humans would be at maths under those circumstances.

This Q* model likely does what Google suggested: Do a tree search instead of a strictly linear search. At each step, the next token is presented as a list of "likely candidates" with probabilities assigned to each one. Simply pick to "top n" instead of the "top 1", branch for a bit like that, and then prune based on the best overall confidence instead of the best next token confidence. This would allow a low-confidence next token to be selected, as long as it leads to a very good overall result. Pruning bad branches is also effectively the same as back-tracking. It allows the model to explore but then abandon dead ends instead of being "forced" to stick with bad chains of thought.

What's especially scary -- the type of scary that would result in a board of directors firing an overly commercially-minded CEO -- is that naive tree searches aren't the only option! Google showed that you can train a neural network to get better at tree search itself, making it exponentially more efficient at selecting likely branches and pruning dead ends very early. If you throw enough computer power at this, you can make an AI that can beat the world's best chess champion, the world's best Go player, etc...

Now apply this "AI-driven tree search" to an AI LLM model and... oh-boy, now you're cooking with gas!

But wait, there's more: GPT 3.5 and 4.0 were trained with either no synthetically generated data, or very little as a percentage of their total input corpus.

You know what is really easy to generate synthetic training data for? Maths problems, that's what.

Even up to the point of "solve this hideous integral that would take a human weeks with pen and paper" can be bulk generated and fed into it using computer algebra software like Wolfram Mathematica or whatever.

If they cranked out a few terabytes of randomly generated maths problems and trained a tree-searching LLM that has more weights than GPT4, I can picture it being able to solve pretty much any maths problem you can throw at it. Literally anything Mathematica could do, except with English prompting!

Don't be so confident in the superiority of the human mind. We all thought Chess was impossible for computers until it wasn't. Then we all moved the goal posts to Go. Then English text. And now... mathematics.

Good luck with holding on to that crown.

2 comments

> We all thought Chess was impossible for computers until it wasn't.

I don't know who 'we' is but Chess was a program for computers before computers powerful enough existed with the hardware represented by people computing the next move.

https://en.wikipedia.org/wiki/Turochamp

The point was to not overvalue the superiority of humans, not that chess engines didn't exist.
I immediately thought of A* path finding, I'm pretty sure Q* is the LLM "equivalent". Much like you describe.