Y
Hacker News
new
|
ask
|
show
|
jobs
by
eginhard
3182 days ago
Recordings are force-aligned to the transcriptions anyway (using essentially a speech recognition system) to obtain phone-level alignments. You don't need explicit timing information beforehand.