|
|
|
|
|
by whycome
245 days ago
|
|
It’s not absurd to think that you could send a model of your voice to a receiving party and then have your audio call just essentially be encoded text that gets thrown through the voice generator on the local machine. AI video could mean that essential elements are preserved (actors?) but other elements are generated locally. Hell, digital doubles for actors could also mean only their movements are transmitted. Essentially just sending the mo-cap data. The future is gonna be weird |
|
> It would be interesting to see how far you could get using deepfakes as a method for video call compression.
> Train a model locally ahead of time and upload it to a server, then whenever you have a call scheduled the model is downloaded in advance by the other participants.
> Now, instead of having to send video data, you only have to send a representation of the facial movements so that the recipients can render it on their end. When the tech is a little further along, it should be possible to get good quality video using only a fraction of the bandwidth.
— https://news.ycombinator.com/item?id=22907718
Specifically for voice, this was mentioned:
> A Real-Time Wideband Neural Vocoder at 1.6 Kb/S Using LPCNet
— https://news.ycombinator.com/item?id=19520194