|
|
|
|
|
by Const-me
1246 days ago
|
|
Here's my D3D11 implementation of speech-to-text https://github.com/Const-me/Whisper With medium model it needs 1.43 GB of assets, 2 GB of VRAM, and on gaming GPUs works at 10x realtime speed. These performance figures might be good enough for modern videogames. BTW, the model understands almost 100 spoken languages and can translate them to English. |
|