Hacker News new | ask | show | jobs
by andrew-w 425 days ago
One way this differs is in the model architecture. Our approach relies on a single pass of a diffusion transformer (DiT), whereas Live Portrait relies on intermediate representations and multiple distinct modules. Getting a DiT to be real-time was a big part of our work. Quoting the Live Portrait paper: "Diffusion-based portrait animation methods [...] are usually [too] computationally expensive." As you hinted at, we had to compromise on resolution to get there (this demo is 256x256), but we think that will improve over time.
1 comments

Not relying on facial keypoints means we can animate a wide range of non-humanoid characters. My favorite is talking to the Doge meme.