Hacker News new | ask | show | jobs
by charlesdaniels 1933 days ago
This reminds me of a technology in Vernor Vinge's "zone of thought" series. I think they called it "evocations", where at the beginning of a call, a model is transmitted that allows the other end to re-construct what the sender would look/sound like from severely abridged data. It sure sounds plausible - the semantically meaningful parts of a conversation (video or audio) would appear to have significantly less entropy that all of the details captured by a mic/webcam. The fact that things like JPEG and MP3 exist are proof enough, and those (to my knowledge) aren't even feature-based.

Maybe N years from now, your {Skype,FaceTime,Zoom,Jitsi} call starts by transmitting a pre-trained auto-encoder that can reproduce your speech and visual appears with a "good enough" margin of error from a few kbps worth of data.

1 comments

It's been done, at least for video: https://www.youtube.com/watch?v=NqmMnjJ6GEg&feature=emb_titl...

Not for audio yet, I think?