I think the thing is though in Large multi models you give it all the data and test it against everything. And it generally does better across most of the benchmarks.
That depends entirely on the use-case - for example if you wanted to build an AI to operate a self-driving car, just training on unlabelled data scraped from the internet is only going to get you so far. It doesn't learn how to do EVERYTHING (not yet at least).