Hacker News new | ask | show | jobs
by palmer_fox 1022 days ago
All these LLMs are pretty general if I understand correctly. Are there any efforts to create specialized models (other than for coding)? Or, what would be even better, "extract" certain areas from existing LLMs as a way to specialize them? With the goal to drastically reduce model size to be able to run on less powerful devices.

E.g. a model specializing in chemistry doesn't need to include data on world's history or to be able to write poetry.

2 comments

I am not an expert but it still has to learn human language/grammar/whathaveyou, and that is where scale seems to matter. Fine-tuning on a subset of knowledge after that is typically how the domain-specialisation is achieved, by my understanding.
Domain specialization is done by continuing the full training process. Fine tuning is more for changing the style of the output than adding new knowledge.
What if the initial training already contains all necessary data for a particular specialization? What would be the benefit of continuing the training process?
Imagine someone tells you about how someone committed a crime and asks you to summarise. Now imagine the same question is asked to a lawyer. Even if you both knew the same facts, the response would be very different in style, highlighted points, mentioned references, etc. The domain specific fine tuning does exactly that. Sure, sometimes you can get very close by changing the prompt to include "respond like a lawyer in situation X with following extra rules", but not always and the fine-tuning gives better results and shorter prompt.
I was wondering about that too. Would it be possible in the future to have a more modular approach to LLMs? Have a module that is responsible for basic knowledge/language/grammar and then other more specialized modules that are added selectively.

I don't know enough about fine-tuning, not sure if the process is capable of removing "unused" parts of the model (I guess not possible, similar to un-learning).

There are various methods for removing unused parts of the model, like distillation. The idea is generally that you always lose performance, but hopefully you lose more size/runcost than you do performance, proportionately.
so, so many. there are RAG specific models (contextual ai), finance specific models (bloomberg gpt, brightwave), contact center models (cresta), even telco models (anthropic).
Very interesting. Thanks for replying!