|
|
|
|
|
by hexaga
463 days ago
|
|
Orpheus is a llama model trained to understand/emit audio tokens (from snac). Those tokens are just added to its tokenizer as extra tokens. Like most other tokens, they have text reprs: '<custom_token_28631>' etc. You sample 7 of them (1 frame), parse out the ids, pass through snac decoder, and you now have a frame of audio from a 'text' pipeline. The neat thing about this design is you can throw the model into any existing text-text pipeline and it just works. |
|