|
|
|
|
|
by echelon
1055 days ago
|
|
This looks like something between fine tuning a top layer and a zero shot approach. This is probably what future voice models will begin to look like as they begin to capture prosody and other fine characteristics in a few hundred kb. |
|