| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mwigdahl 319 days ago
	Is this just distillation but with a step to filter out low-quality responses first?

1 comments

GabrielBianconi 319 days ago

AFAIK, distillation typically refers to tuning on the logits of the larger model, so you wouldn't be able to do that with fine-tuning APIs (OpenAI + Google in our blog post). We fine-tune on the outputs themselves.

But broadly speaking, yes, we generate data using a large model, curate the best samples using metrics from the environment, and fine-tune on that data. This isn't a novel technique from an academic perspective; our focus is on applying it to different use cases (e.g. agentic RAG, agentic tool use) and models (OpenAI, Google, Qwen).

Thanks!

link

littlestymaar 319 days ago

> AFAIK, distillation typically refers to tuning on the logits of the larger model

I think this is called “logit distillation” which is a particular form of distillation but not the only one.

> so you wouldn't be able to do that with fine-tuning APIs (OpenAI + Google in our blog post)

Dististillation from competitors' API is so common it has been given a name: it's called “distealing”.

link

mwigdahl 319 days ago

Thanks for the explanation and the clarification on terminology! I've used a similar approach myself and it sounded like you were doing something similar.

link