| HN Mirror

Let me rephrase that for the haters: the quality of that transcription API was godawful.

I took a ten minute audio segment from a two-person interview, and chopped it up in shorter segments to fit under the 60-second limit, with varying overlap durations to make sure that full sentences would be included on either side of the snip. I ran a battery of tests with segments of 20s, 30s, 40s, 50s and overlaps of 3s, 5s, and 10s. The output was essentially useless garbage, with wild differences in the transcription depending on segment lengths and overlap durations. In one configuration one sentence may be perfectly transcribed and the next was word salad, in another both sentences were useless salads, in another half of each sentence was right but words were missing, etc. No configuration ever yielded a useful output. Time and money spent: several hours, $$$.