Hacker News new | ask | show | jobs
by fbnbr 1002 days ago
I think smaller expert models will dominate the majority of applications. there is an optimum and fine balance to strike when it comes to size and usability. There will be many mechanisms like demonstrated in the post to find that optimum and realize it.
1 comments

And for a large general model just have multiple small expert models and a relay model that decides which domain specific model to ask
It would get more complex with cross domain questions. Part of what makes LLMs feel magical is their ability to synthesize such disparate topics and information into coherent “thoughts.”

I would suspect what you have instead is a single model attached to data sources, where the model doesn’t have to have so much compressed fact, and instead can rely on higher level summary.

I did a senior reading/research class in AI back in 1989, as my final class I needed to graduate with a BSCS. The idea originally was to do a survey of the methods known at the time and determine which was the best. This included things like e-mycin, the first generalized expert system toolkit.

I ended up with the premise that each of them had their relative strengths and weaknesses, and it would actually be best to use all of them, but only in their own areas of strength. Then have them use something akin to a shared blackboard where they could all read the results from the other systems and write their results as well. That the sum of all the available algorithms working together, each in the areas in which they were best, would result in better outcomes than any one algorithm could achieve on its own.

My professor was not impressed. I only got a C.

Now, the story of how my dad had to work his ass off to finally get the College of Engineering to force the professor to actually give me a grade that he owed me, when all the professor really wanted to do was focus on his new job at one of the big airlines -- well, that's a story for another time.

This approach would remove one of the main benefits - the ability to run multi-task one-shot prompts where a single LLM call returns answers to multiple NLP tasks.
It's not a likely solution given how loss functions work, but in theory a single model could learn to perform exactly the function you describe. When you say "just do X" where X is any function (in this case, a piecewise function), a large enough model could do it.
After some reflection, it's maybe more accurate to visualize this in reverse: all expert models see the problem and attempt a solution, and then some "manager" model decides which expert model has the best solution and outputs it.
Until the manager model decides to outsource the expert models.
In theory you need two layers to model any function. In practice this is wildly different.
Any memoryless continuous function between two Euclidean spaces, I think you mean. The experts-and-manager model would need to be able to do more than that (as do most neural networks).

And part of the reason why single-hidden-layer networks aren't enough even in continuous memoryless Euclidean cases is, again, because of how loss functions work; you're unlikely to converge on a good approximation with very few hidden layers.

I wrote about this a few months ago, right before OpenAI released ther Plugin feature: https://faingezicht.com/articles/2023/03/02/federated-langua...

TLDR, the essay explores how LLM could evolve into front-end routers that connect users with specialized tools, leading to a future where federated models determine the best-suited system to answer specific queries. Not too different from today's federated search approaches.