Hacker News new | ask | show | jobs
by elcomet 559 days ago
It's possible, the question is how to choose which submodel will be used for a given query.

You can use a specific LLM, or a general larger LLM to do this routing.

Also, some work suggest using smaller llms to generate multiple responses and use a stronger and larger model to rank the responses (which is much more efficient than generating them)