| "cost and convenience":
cost: $57.60 vs 0
Why would you want to pay nearly $700 a year just to avoid running a program in the background on whatever computer you already have open? convenience:
yes, it's a nicer interface, but the current state of the "geeky" version is type command on command line, with path to file. The end. unless you're really afraid of the command line it's not that much more convenient. The text line being highlighted while you listen is nice but a) we wrote something that did it at the word level (as opposed to sentence..ish level) nearly 20 years ago, b) in this context it's not actually that useful. With video sure... you can click the text and go to teh right place in the video. With spoken text (what this is best at) you click and go to the point...where they're saying what you just read. Unless you really want to hear what you just read, there's not a lot of added value. Would it be good for podcasts to use an interface like this for playback? absolutely. It'd be a massive upgrade, but that's not what this is offering. maybe someone will extract that code and let us combine the MP3 and timestamped text file in a web site (if that doesn't already exist). That'd be cool. But, the cost you propose is way too much for most people, especially in countries that aren't rich. In many places $400 a month is a really good salary. So yeah, if you're rich $700 a year is not a big deal, but... |
Second, don't underestimate the business value of a nice interface. IMO, the value of excellent UI/UX is part of why ChatGPT took off the way it did. The number of people willing to pay a few dollars per month in order to never have to see a command line is quite a bit larger than the number of people willing to host their own `whisper-large` inference.
Speaking of hosting, do you already own hardware that supports sufficiently fast inference? If not, how much would a good enough cloud instance cost you per month? It depends on how fast is fast enough, but more than $0, that's for sure.