Hacker News new | ask | show | jobs
by echelon 1055 days ago
This looks like something between fine tuning a top layer and a zero shot approach.

This is probably what future voice models will begin to look like as they begin to capture prosody and other fine characteristics in a few hundred kb.