Hacker News new | ask | show | jobs
by cubefox 490 days ago
This suggests fine-tuning a base model (with SL or RL) generally doesn't make the model inherently smarter, only the initial self-supervised learning during pretraining does. Though it would be strange if no amount of reinforcement learning could make the LLM truly smarter.