| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sailingparrot 133 days ago

> you don't need to make a video model. You probably don't need to decode the latents at all.

If you don't decode, how do you judge quality in a world where generative metrics are famously very hard and imprecise? How do you go about integrating RLHF/RLAF in your pipeline if you don't decode, which is not something you can skip anymore to get SotA?

Just look at the companies that are explicitly aiming for robotics/simulation, they *are* doing video models.