|
|
|
|
|
by Metricon
458 days ago
|
|
If you run the `gguf_orpheus.py` file in that repository, it will capture the audio tokens and convert them to a .wav file. With a little more work, you can feed the streaming audio directly using `sounddevice` and `OutputStream` On a Nvidia 4090, it's producing: prompt eval time = 17.93 ms / 24 tokens ( 0.75 ms per token, 1338.39 tokens per second)
eval time = 2382.95 ms / 421 tokens ( 5.66 ms per token, 176.67 tokens per second)
total time = 2400.89 ms / 445 tokens
*A Correction to the llama.cpp server command above, there are 29 layers so it should read "-ngl 29" to load all the layers to the GPU. |
|