Hacker News new | ask | show | jobs
by column 1058 days ago
Do you use Whisper for the transcript (which version? base?) and GPT-3.5-turbo for the language model? Do you provide a self-hosted solution for the companies that don't want their meetings going "on the cloud"? I do not mean to be dismissive of all your work, I know too well the devil is in the details, but what are the key advantages of using your solution over having a Python dev (or GPT-4) write a similar tool using Langchain + whisper + llama2 for example? Again, please do not take this as a cheap shot, I might not be the target audience but if I were to use such a tool I would like everything to run locally because of privacy/corporate spying concerns. Thanks!

EDIT: Also it is unclear if you support other languages than English. Whisper does, so in theory you should. There are companies out there where English is not the work language.

2 comments

They have their own ASR Conformer-2[0] and support 9 languages (they count it as 12)[1]

It looks like their synchronous transcribe is much slower than whisper, but if you need it fast, you need their realtime ASR (or amazon or google's).

[0] Conformer-2 is trained on 1.1M hours of English https://www.assemblyai.com/blog/conformer-2/ [1] https://www.assemblyai.com/docs/Concepts/supported_languages

You can use deepgram who has their own model but also has an option to use whisper hosted by them