Hacker News new | ask | show | jobs
by burkaman 618 days ago
Yes, and like pretty much every AI release I've seen, even these cherry-picked examples mostly do not quite match the given prompt. The outputs are genuinely incredible, but if you imagine actually trying to use this for work, it would be very frustrating. A few examples from this page:

Pumpkin patch - Not sitting on the grass, not wearing a scarf, no rows of pumpkins the way most people would imagine.

Sloth - that's not really a tropical drink, and we can't see enough of the background to call it a "tropical world".

Fire spinner - not wearing a green cloth around his waist

Ghost - Not facing the mirror, obviously not reflected the way the prompter intended. No old beams, no cloth-covered furniture, not what I would call "cool and natural light". This is probably the most impressively realistic-looking example, but it almost certainly doesn't come close to matching what the prompter was imagining.

Monkey - boat doesn't have a rudder, no trees or lush greenery

Science lab - no rainbow wallpaper

This seems like nitpicking, and again I can't underestimate how unbelievable the technology is, but the process of making any kind of video or movie involves translating a very specific vision from your brain to reality. I can't think of many applications where "anything that looks good and vaguely matches the assignment" is the goal. I guess stock footage videographers should be concerned.

This all matches my experience using any kind of AI tool. Once I get past my astonishment at the quality of the results, I find it's almost always impossible to get the output I'm looking for. The details matter, and in most cases they are the only thing that matters.

1 comments

The one thing that immediately stood out to me in the ghost example was how the face of the ghost had "wobbly geometry" and didn't appear physically coupled to the sheet. This and the way the fruit in the sloth's drink magically rested on top of the drink without being wedged onto the edge of the glass as that would require were actually some of the more immediate "this isn't real" moments for me.
The ghost is insanely impressive, it's the example that gave me a "wow" effect. The cloth physic looks stunning, I never thought we would reach such a level of temporal coherence so fast.
I think those types of visual glitches can probably be fixed with more or better training, and I have no doubt that future versions of this type of system will produce outputs that are indistinguishable from real videos.

But better training can't fix the more general problem that I'm describing. Perfect-looking videos aren't useful if you can't get it to follow your instructions.