|
Try doing a head to head comparison using all LLM tricks available including prompt engineering, rag, reasoning, inference time compute, multiple agents, tools, etc Then try the same thing using fine tuning. See which one wins. In ML class we have labeled datasets with breeds of dogs hand labeled by experts like Andrej, in real life users don’t have specific, clearly defined, and high quality labeled data like that. I’d be interested to be proven wrong I think it is easy for strong ML teams to fall into this trap because they themselves can get fine tuning to work well. Trying to scale it to a broader market is where it fell apart for us. This is not to say that no one can do it. There were users who produced good models. The problem we had was where to consistently find these users who were willing to pay for infrastructure. I’m glad we tried it, but I personally think it is beating a dead horse/llama to try it today |