|
|
|
|
|
by EForEndeavour
1204 days ago
|
|
I agree with you, but the reason is cost and convenience. Whisper v2 costs $0.006 per minute of transcribed text: https://openai.com/pricing If you had meetings every working hour, you'd have up to ~160 hours of audio per month to transcribe. For most people, this is a gross overestimate. Throwing this audio at OpenAI's API would cost $57.60 per month, and also frees you up from having to set up and maintain local inference. |
|
convenience: yes, it's a nicer interface, but the current state of the "geeky" version is type command on command line, with path to file. The end. unless you're really afraid of the command line it's not that much more convenient.
The text line being highlighted while you listen is nice but a) we wrote something that did it at the word level (as opposed to sentence..ish level) nearly 20 years ago, b) in this context it's not actually that useful. With video sure... you can click the text and go to teh right place in the video. With spoken text (what this is best at) you click and go to the point...where they're saying what you just read. Unless you really want to hear what you just read, there's not a lot of added value.
Would it be good for podcasts to use an interface like this for playback? absolutely. It'd be a massive upgrade, but that's not what this is offering.
maybe someone will extract that code and let us combine the MP3 and timestamped text file in a web site (if that doesn't already exist). That'd be cool.
But, the cost you propose is way too much for most people, especially in countries that aren't rich. In many places $400 a month is a really good salary. So yeah, if you're rich $700 a year is not a big deal, but...