Hacker News new | ask | show | jobs
by loudmax 723 days ago
Depends on your frame of reference. Compared to anything else I've seen generated on a consumer grade GPU, I'd say these are are indeed pretty good.

Here's their example gallery: https://hpcaitech.github.io/Open-Sora/

Compared to the outputs from other models run on consumer grade GPUs, I'd say those are very good.

2 comments

Looking at it in low res (small rectangle within the webpage on mobile) they actually look great!
What's the useful frame of reference?

Looking better than other things that are also bad is sort interesting in that it represents progress in some direction, but it isn't very interesting to people outside of the topic.

For me I use the Will Smith video[0] from just over a year ago. Compared to the examples it's a pretty stark difference.

https://arstechnica.com/information-technology/2023/03/yes-v...

Yes, people learned not to generate other people eating. Current SOTA models still have no concept of walking (left leg, right leg, left leg, right leg; it's so complicated?), there is no reason to believe that they have learned the peculiarities of food consumption.
We seem to be in an exponential uptick phase of tech driven by hardware improvements; a few years ago this was impossible on consumer grade GPU. So in some sense there isn't a useful frame of reference, state of the art should improve out-of-sight about every 2 years and eventually I'd expect iPhones to be outgenerating Disney at movies.
Not GP, but when I looked at the examples, I thought that those already look pretty useable in comic book-like storytelling to set the mood. I.e. in settings where smaller details of the scene are not relevant and are not taking away from the "larger product".
Good that this frame of reference is hn and not some random website where people have no connection to ml...