| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by braindead_in 3383 days ago
	What's the accuracy level of alignment?

1 comments

alpe 3383 days ago

aeneas is not based on ASR (i.e., it does not try to "recognize" words and align them with the input text), but on the "older" MFCC + DTW approach.

Hence, it is difficult to give you a precise answer, e.g. in terms of word-error-rate or similar metrics.

For the task aeneas has been designed for --- aligning an ebook and the corresponding audiobook --- and for similar tasks (e.g., captioning videos of lectures or spoken-only content), it generally produces an alignment that is indistinguishable from a manually-produced one.

If you want to see some examples, read+listen one of these audio-ebooks: the alignment has been produced by aeneas: https://www.readbeyond.it/ebooks.html

But of course if you want to align at finer level (word) or a more noisy/non-matching audio, the quality of the alignment can deteriorate.

link

braindead_in 3382 days ago

Thanks for the explanation. Will it work if there are gaps in the transcript? Eg, the clean verbatim transcript where the ah's and uhm's are left out.

link

alpe 3382 days ago

Several users of aeneas interested in producing caption files for videos told me that it does. And considering how DTW works, it is plausible.

Unfortunately, I have not had the time to setting up a suitable corpus and performing a rigorous evaluation to comfortably answering your question with a definitive answer "yes".

Perhaps the best option to see if aeneas works for your use case, consists in trying it out.

If you do not want to install anything on your machine, you can use the aeneas Web app: https://aeneasweb.org --- basically you submit an audio file (or a YouTube URL) and a text file, and get a SRT/TTML/etc. file emailed back.

link

braindead_in 3377 days ago

I definitely plan to try it soon.

link