| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jamala1 810 days ago
	I guess it's the difference between an ensemble and a mixture of experts, i.e. aggregating outputs from (a) model(s) trained on the same data vs different data (GPT-4). Though GPT-4 presumably does not aggregate, but it routes.