Hacker News new | ask | show | jobs
by TobiasEnholmX 492 days ago
AI-generated video struggles with consistency. Flickering, weird proportions, and characters changing. I tried using 3D as a way to get more consistency.

Overall, it worked. No sudden changes in proportions, clothing, or style. Still, there are some limitations, especially with fine details.

We’re looking into whether this could be useful as a tool and would love to hear what you think: Has anyone experimented with 3D + AI generation for images or video? or sees a better way to approach this?

Demo and details in the blog: https://backdroptech.github.io/3d-to-video/

3 comments

We and several other startups tried this, and we even filed a few patents.

1. The users with experience and patience for this are slim. Blocking out a scene and the animation are tough, and the users with this skill and inclination are using Blender and ComfyUI already or are submitting renders to RunwayML V2V. We're still too early for AI auto rigging and animation to work, though those technologies will make this approach easier.

2. AI video users want I2V quality and predictability, not unpredictable V2V style transfer. You need control over the exact look and feel of the starting frame as well as the animation. If you can't get this, the renders are useless.

3. One of the advantages of AI video is that it can animate things a human animator cannot easily do. Non-humanoids, crowd movements, explosions, etc.

Basically, this requires deep integration with a new class of video model.

You'll find that even with this technology perfected, it fits into a comprehensive suite of tools that AI video creators will use. They will still lean on I2V for most shots and V2V compositing for other shots.

I've done a lot of hands-on interviews and demos. Steve May, various studios, schools, etc. Steve kind of negged me and told me there are bigger players working on this. My guess was Odyssey Systems at the time, but they turned out to be working on something else.

I do think this is a valuable technology, but there's a tremendous amount of work to do to make it work.

First off, I’d love to hear more about your experience. If you’re up for a chat, shoot me an email at tobias@backdrop.tech, or let me know where I can learn more about what you’ve worked on!

1. I assume you mean patience when setting up a 3D scene? That’s definitely a factor, but it’s getting easier with image-to-3D tools, and AI can even assist with object placement to speed things up.

2.Yeah, predictability is key. Our approach is about making it easier to generate high-quality, consistent images, which are then fed into video models—rather than relying on direct video-to-video style transfer, which can be more chaotic.

3. Agreed! AI can animate things that traditional methods struggle with, but consistency is still a challenge. This workflow helps strike a balance between AI flexibility and user control.

Yes, it's better to use AI to automate individual steps in the actual animation process itself, with AI workers.

The quality is identical to human animators (because: same tools, same process).

Just the cost is lower (although training the AI workers is a new cost).

The only companies that can do this are those that have a very strong workflow, because AI workers operate on individual steps, NOT the entire workflow.

2D animation already has this, so it's easier to adapt to AI workers than 3D (for that reason).

Nice demo examples. I'm a casual observer and curious how this differs from Gaussian splatting which also (implicitly?) uses 3D representations.

I could see applying changes at the 3D model level which wouldn't be directly accessible if it was only an internal representation.

Yeah, exactly. Gaussian Splatting works great when you have an image (or set of images) and want to reconstruct a whole scene in 3D, but it treats everything as a unified point-based representation. In a structured 3D scene, though, objects are clearly separated, so you can manipulate them individually.

For example, you can attach a LoRA specifically to one object and run a separate workflow just for that, giving you way more control. That’s a big difference—Gaussian Splatting doesn’t naturally lend itself to object-level edits since everything is blended into the same representation.