Hacker News new | ask | show | jobs
by PhunkyPhil 36 days ago
> In distillation, you take a set of prompts you are interested in, and record the big LLM's outputs, then train your small model to produce the same output as the big LLM.

Why use the bigger LLM outputs for this and not human outputs? If we assume that human responses to prompts are better than sota models (in some cases they are) then why use the big model at all?

1 comments

Have you seen what annotations cost? It can be on the order of $50/annotation for a reasonable document, some agentic annotations can cost over $1000 each, whereas a model response might cost $0.10, or maybe $20 for an agentic session. Plus all of that takes a ton of effort to collect.

You can set up model distillation as a weekend batch job.