|
|
|
|
|
by alpe
3382 days ago
|
|
aeneas is not based on ASR (i.e., it does not try to "recognize" words and align them with the input text), but on the "older" MFCC + DTW approach. Hence, it is difficult to give you a precise answer, e.g. in terms of word-error-rate or similar metrics. For the task aeneas has been designed for --- aligning an ebook and the corresponding audiobook --- and for similar tasks (e.g., captioning videos of lectures or spoken-only content), it generally produces an alignment that is indistinguishable from a manually-produced one. If you want to see some examples, read+listen one of these audio-ebooks: the alignment has been produced by aeneas: https://www.readbeyond.it/ebooks.html But of course if you want to align at finer level (word) or a more noisy/non-matching audio, the quality of the alignment can deteriorate. |
|