Hacker News new | ask | show | jobs
by yourcousinbilly 1371 days ago
Video engineer here. Many seemingly network restricted tasks could be unlocked with faster CPUS doing advanced compression and decompression.

1. Video Calls

In video calls, encoding and decoding is actually a significant cost of video calls, not just networking. Right now the peak is Zoom's 30 video streams onscreen, but with 1000x CPUS you can have 100s of high quality streams with advanced face detection and superscaling[1]. Advanced computer vision models could analyze each face creating a face mesh of vectors, then send those vector changes across the wire instead of a video frame. The receiving computers could then reconstruct the face for each frame. This could completely turn video calling into a CPU restricted task.

2. Incredible Realistic and Vast Virtual Worlds

Imagine the most advanced movie realistic CGI being generated for each frame. Something like the new Lion King or Avatar like worlds being created before you through your VR headset. With extremely advanced eye tracking and graphics, VR would hit that next level of realism. AR and VR use cases could explode with incredibly light headsets.

To be imaginative, you could have everything from huge concerts to regular meetings take play in the real world, but be scanned and sent to VR participants in real time. The entire space including the room and whiteboard or live audience could be rendered in realtime for all VR participants.

[1] https://developer.nvidia.com/maxine-getting-started

1 comments

> In video calls, encoding and decoding is actually a significant cost of video calls, not just networking. Right now the peak is Zoom's 30 video streams onscreen, but with 1000x CPUS you can have 100s of high quality streams with advanced face detection and superscaling[1]. Advanced computer vision models could analyze each face creating a face mesh of vectors, then send those vector changes across the wire instead of a video frame. The receiving computers could then reconstruct the face for each frame. This could completely turn video calling into a CPU restricted task.

Interesting, how do you see this different from deep learning based video coding recently demonstrated? [1]

[1]https://dl.acm.org/doi/10.1145/3368405