|
|
|
|
|
by rjwilmsi
968 days ago
|
|
I agree. When using the small or medium en models either for real-time speech recognition of a native English speaker or for transcribing podcasts of native English speakers the error rate is nowhere near 10%. I might say it's something like 1% of which the majority of errors are possibly subjective decisions about punctuation. But I have found the error rates are much higher on the tiny model and higher on the base model. I assume therefore that the 10% word error rate is on very difficult audio such as pilots speaking to Air Traffic Control (distorted or clipped microphones with significant background noise), which I personally find can be difficult to 100% understand even though I'm a native English speaker and when both pilots and air traffic control are native English speakers. |
|