|
> Zero-shot learning (ZSL) is a problem setup in deep learning where, at test time, a learner observes samples from classes which were not observed during training, and needs to predict the class that they belong to. The name is a play on words based on the earlier concept of one-shot learning, in which classification can be learned from only one, or a few, examples. https://en.wikipedia.org/wiki/Zero-shot_learning edit: since there seems to be some degree of confusion regarding this definition, I'll break it down more simply: We are modeling the conditional probability P(Audio|Voice). If the model samples from this distribution for a Voice class not observed during training, it is by definition zero-shot. "Prediction" here is not a simple classification, but the estimation of this conditional probability distribution for a Voice class not observed during training. Providing reference audio to a model at inference-time is no different than including an AGENTS.md when interacting with an LLM. You're providing context, not updating the model weights. |