|
|
|
|
|
by kamilafsar
1289 days ago
|
|
I wonder what the correlation between consistency of written/spoken language is for the breakdown per language here:
https://github.com/openai/whisper/blob/main/language-breakdo... For instance, I know Turkish is very consistent: it was refactored in 1928 with the birth of Turkey. Turkish is quite high in the rankings. I don't think because there's loads of data available, but because of its consistency. Contrary, English has loads of data, which should compensate for it inconsistency. |
|