| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by londons_explore 1100 days ago
	They probably trained all 8 experts on the same data. The experts may have become good at different topics, but no human divided up the topics. The output isn't just the best of the 8 experts - it is a blend of the opinions of the experts. Another (usually smaller) neural net decides how to blend together the outputs of the networks, probably on a per-token basis (ie. for each individual word (ie. token), the outputs of all the experts is consulted, and then blended together, and a word picked (sampled), before moving onto the next word)

1 comments

mrfinn 1100 days ago

I guess that neural network has to have the capability of identifying the subject and know in every moment which network is the most capable for that subject, otherwise I can't understand how it could possibly evaluate which is the best answer.

link

londons_explore 1100 days ago

Results of this sort of system frequently look almost random to human eyes. For example one expert might be the "capital letter expert", doing a really good job of putting capital letters in the right place in the output.

link