Hacker News new | ask | show | jobs
by janalsncm 851 days ago
My understanding of GPT4 is that it is a mixture of experts. In other words, multiple GPT 3.5 models responding to the same prompt in parallel, and another model on top choosing the best response among them.

So in that case, more models could give a better response, which costs more compute.

1 comments

Where did you get that understanding? This doesn't really make any sense, how would GPT be able to stream token at a time in the first place?
There's actually information provided during token generation that act as a level of confidence.

You can definitely stream and choose the highest scoring values amongst a few shots at generating the best next token candidate.