Hacker News new | ask | show | jobs
by londons_explore 1100 days ago
They probably trained all 8 experts on the same data. The experts may have become good at different topics, but no human divided up the topics.

The output isn't just the best of the 8 experts - it is a blend of the opinions of the experts. Another (usually smaller) neural net decides how to blend together the outputs of the networks, probably on a per-token basis (ie. for each individual word (ie. token), the outputs of all the experts is consulted, and then blended together, and a word picked (sampled), before moving onto the next word)

1 comments

I guess that neural network has to have the capability of identifying the subject and know in every moment which network is the most capable for that subject, otherwise I can't understand how it could possibly evaluate which is the best answer.
Results of this sort of system frequently look almost random to human eyes. For example one expert might be the "capital letter expert", doing a really good job of putting capital letters in the right place in the output.