Hacker News new | ask | show | jobs
by manrajsingh 2194 days ago
At our lab, we extensively work on problems that involve speech data. This includes tasks like speech recognition, speech scoring, emotion recognition, topic detection and speaker diarisation. Some of these tasks have public data available, while tasks like speech scoring and low-resource speech recognition, the data is fairly limited for supervised learning. Hence, we developed this annotation tool to generate corpus for our need.
2 comments

In case still not clear, it does not do the transcription, it does not. Oh Hi Mark. It asks you to manually annotate it (in case you want to prepare a training data set for your algorithm), its not an AI algorithm.
This is the most helpful comment here. I still don’t understand what the tool is for though. Up until now I assumed it would allow me to get automatic transcriptions, including breaking them down by speaker.
I was looking into that space recently and I have used otter.ai for transcriptions which gives you 6000 minutes/month for 8 USD, which is insanely cheap in that space. Their British language model is quite good as well.

I’ve bulk exported generated srt/vtt files from my fav podcasts and using tinysearch that was posted here recently with ableplayer to provide audio full text search of my Jekyll published podcasts posts and with clickable timestamps to audio play of search phrases.

Whenever I want to know what podcaster has to say on specific subject a quick search makes such a difference!

Awesome. Thanks for the info. I look forward to trying out your suggestion.
So this tool is mostly a way to store your dataset?

Eg. doing things like force alignment should be done in other tools and use the api to put in the dataset?