Hacker News new | ask | show | jobs
by bobosha 1363 days ago
> In video calls, encoding and decoding is actually a significant cost of video calls, not just networking. Right now the peak is Zoom's 30 video streams onscreen, but with 1000x CPUS you can have 100s of high quality streams with advanced face detection and superscaling[1]. Advanced computer vision models could analyze each face creating a face mesh of vectors, then send those vector changes across the wire instead of a video frame. The receiving computers could then reconstruct the face for each frame. This could completely turn video calling into a CPU restricted task.

Interesting, how do you see this different from deep learning based video coding recently demonstrated? [1]

[1]https://dl.acm.org/doi/10.1145/3368405