Hacker News new | ask | show | jobs
by nl 108 days ago
Model distillation is very useful!

Put it like this: Reinforcement Learning from Human Feedback (RLHF) is useful with hundreds of examples, and LLM distillation is basically the same thing.