| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zdyn5 719 days ago
	I know it’s probably using < 1000x compute of the real Sora, but “pretty good” is stretching it

2 comments

loudmax 719 days ago

Depends on your frame of reference. Compared to anything else I've seen generated on a consumer grade GPU, I'd say these are are indeed pretty good.

Here's their example gallery: https://hpcaitech.github.io/Open-Sora/

Compared to the outputs from other models run on consumer grade GPUs, I'd say those are very good.

link

yreg 719 days ago

Looking at it in low res (small rectangle within the webpage on mobile) they actually look great!

link

maxerickson 719 days ago

What's the useful frame of reference?

Looking better than other things that are also bad is sort interesting in that it represents progress in some direction, but it isn't very interesting to people outside of the topic.

link

clwg 719 days ago

For me I use the Will Smith video[0] from just over a year ago. Compared to the examples it's a pretty stark difference.

https://arstechnica.com/information-technology/2023/03/yes-v...

link

Lockal 718 days ago

Yes, people learned not to generate other people eating. Current SOTA models still have no concept of walking (left leg, right leg, left leg, right leg; it's so complicated?), there is no reason to believe that they have learned the peculiarities of food consumption.

link

roenxi 719 days ago

We seem to be in an exponential uptick phase of tech driven by hardware improvements; a few years ago this was impossible on consumer grade GPU. So in some sense there isn't a useful frame of reference, state of the art should improve out-of-sight about every 2 years and eventually I'd expect iPhones to be outgenerating Disney at movies.

link

forkerenok 719 days ago

Not GP, but when I looked at the examples, I thought that those already look pretty useable in comic book-like storytelling to set the mood. I.e. in settings where smaller details of the scene are not relevant and are not taking away from the "larger product".

link

Flumio 719 days ago

Good that this frame of reference is hn and not some random website where people have no connection to ml...

link

thriftwy 719 days ago

Just run all key frames through stable diffusion and it should be quite good.

link

uh_uh 719 days ago

I don't think there's such a thing as key frames here, just frames. And if you run SD through every frame, the output will be janky because SD doesn't know about temporal coherence.

link

thriftwy 719 days ago

As soon as it's an MP4 it will have key frames all right. You could add AI upscaling to your encoder. People are making fun of "Just", but I believe I could take apart ffmpeg to add this feature (PoC) in two weeks or less. Provide somebody pays for my labor and for the HW.

link

ehsankia 719 days ago

Honestly neither does OpenSora it seems, as it is pretty damn janky already.

link

mejutoco 719 days ago

You can pass a few frames as a single image grid. Then you will get coherence, although it will be very limited by gpu ram.

link

resource_waste 719 days ago

>Just

Adding the word 'just' doesnt make it any easier. Something I've noticed is that people who have never done something themselves and are telling someone to do an difficult task, will use:

"Just"

in-front of it.

This is particularly relevant in tech.

link

tetris11 719 days ago

Depends if it's already been done before, in which case "just" would then have been just used quite justly.

link

PhoenixFlame101 719 days ago

It's very likely the comment you replied to said it in a joking sense

link

sertraline 719 days ago

It is extremely difficult to tell if the person is joking in a field full of people who think AI is some sort of magic.

link

nineteen999 719 days ago

Well I mean calling any of this diffusion/LLM stuff "AI" is a misnomer to begin with.

link