Hacker News new | ask | show | jobs
by CamperBob2 5 days ago
That's a fairly obvious idea, not dumb at all, but unfortunately it doesn't seem to pan out. Trying to specialize an LLM in one area harms its 'cognition' in all areas. For instance, if you train a coding model without all the Shakespeare and soap operas and Wikipedia and pirated Stephen King books and ancient Roman history and whatever, you end up with a worse coding model.

I'm not sure anyone really understands why.

1 comments

The article is not backed up by reality. Why would use anything but a domain-specific LLM, if they actually worked?

The author is probably confusing RAG with pretraining. You can RAG on PubMed but you can't arrive at a competitive model by pretraining solely on it.