Hacker News new | ask | show | jobs
by ivape 309 days ago
I don't think we can say that until we hear how Genie3 and Veo3 were trained. My hunch is that the next-gen multi-modal models that combine world, video, text, and image models can only be trained on the best chips.