Hacker News new | ask | show | jobs
by artificial 3159 days ago
The 2017 F8 developer conference featured a lot on Machine Vision. Processing images and video for objects, such as for automatic close captioning, is where vast resources are focused. I highly recommend watching a few of the videos, they’re specially aware as well and can infer orientation of obscured things like limbs. Microsoft has real-time audio translation, doing machine transcription at scale is totally feasible.
1 comments

The OP that I was replying to was insinuating that FB is collecting audio as training data to create AI models like the ones you are talking about. I was pointing out that raw audio is useless to train an AI model for recognizing words, the whole point of training data for AI is that you have an input and a known output (transcription) that you can use to train and test the model with, having just input is useless for training.