Hacker News new | ask | show | jobs
by lossolo 323 days ago
> Could you train a small language model with a big one?

Yes, it's called distillation.