Hacker News new | ask | show | jobs
by crucifiction 3159 days ago
Its only useful to train if they also have the transcript by a human. Eavesdropping conversations and having armies of people transcribing them seems like a very expensive and illegal way to get that data when there are probably millions of available samples, TV shows, etc that have both voice and transcription available already off the shelf.
1 comments

The 2017 F8 developer conference featured a lot on Machine Vision. Processing images and video for objects, such as for automatic close captioning, is where vast resources are focused. I highly recommend watching a few of the videos, they’re specially aware as well and can infer orientation of obscured things like limbs. Microsoft has real-time audio translation, doing machine transcription at scale is totally feasible.
The OP that I was replying to was insinuating that FB is collecting audio as training data to create AI models like the ones you are talking about. I was pointing out that raw audio is useless to train an AI model for recognizing words, the whole point of training data for AI is that you have an input and a known output (transcription) that you can use to train and test the model with, having just input is useless for training.