I am just paying for a somewhat expensive server and I love how it's really fast but also I have a lot of free GPU time so might as well let others use it too lol. It's an experiment to see if people will use it productively or if someone will abuse it and ruin it for others lol
Maybe I will put in some mechanism to prevent that but for now I just want to see if people could find it useful. I also have the code open source and will write tutorials for people to put up their own instance as well
It would be more useful if one could directly paste links to videos online as well. But yeah, in general this is extremely useful. I'm looking forward to video site integration. Would be great if youtube could finally retire their horrible auto caption function for something that actually works. Being able to easily watch media in different languages from around the world will be an absolute game changer.
I also plan to support automatic language translation I have that working locally already actually, and I work for one of the big alt-video platforms and rumour has it that I will be shipping this feature for them soon (auto transcription with auto translated subtitles)
Also, I have that tested (auto download) with YouTube-Dl, it works fine but haven't put it into the frontend, but may as well, it helps a lot on your own instance so you don't have to download it first and then upload it
I setup the server to only transcribe two files at a time, so yeah someone could abuse it for sure with two big uploads and stick everyone else on the queue. But for me, even a 3 hour video translates with large model in about ~30 minutes so it wouldn't be too bad, but hopefully everyone is conscious to not do that, so far nobody has abused it which is cool.
Me again - why two at a time? In my initial testing with whisper-asr-webservice and my RTX 3090 I could pretty easily throw ~10 different files at it simultaneously as there is some natural staggering between API entry, CPU conversion/resampling/transcoding of audio, the actual audio length, network effects like upload speed, etc.
I also implemented some anti-abuse-ish features between traefik and Cloudflare that should help it stand up better in the face of bad actors abusing it.
Certainly not something to necessarily depend on but I thought I'd mention it.
> I am just paying for a somewhat expensive server and I love how it's really fast but also I have a lot of free GPU time so might as well let others use it too lol.
Thanks! Whisper is a lot of fun but it didn't take long before I wanted to build a frontend. And then I built something that I think came out super nice so why not share it with people. I used to pay $100/month for transcriptions and this works a lot better for me so might as well open-source transcription if I can, but I give all the credit to Whisper that module they put out is amazing
Very generous of you. I made a similar free service 3 years ago using much worse tools and it's so cool to see whisper making it all so much better and more efficient. Thanks for releasing for free
No problem! I am just seeing how it runs, I might throw up a referral link to Vast and put up a tutorial on how to host your own service, maybe that can offset the cost a bit? The current server is $700/month, maybe it could just run off donations who knows