| From: https://twitter.com/EMostaque/status/1760660709308846135 Some notes: - This uses a new type of diffusion transformer (similar to Sora) combined with flow matching and other improvements. - This takes advantage of transformer improvements & can not only scale further but accept multimodal inputs.. - Will be released open, the preview is to improve its quality & safety just like og stable diffusion - It will launch with full ecosystem of tools - It's a new base taking advantage of latest hardware & comes in all sizes - Enables video, 3D & more.. - Need moar GPUs.. - More technical details soon >Can we create videos similar like sora Given enough GPUs and good data yes. >How does it perform on 3090, 4090 or less? Are us mere mortals gonna be able to have fun with it ? Its in sizes from 800m to 8b parameters now, will be all sizes for all sorts of edge to giant GPU deployment. (adding some later replies) >awesome. I assume these aren't heavily cherry picked seeds? No this is all one generation. With DPO, refinement, further improvement should get better. >Do you have any solves coming for driving coherency and consistency across image generations? For example, putting the same dog in another scene? yeah see
@Scenario_gg's great work with IP adapters for example. Our team builds ComfyUI so you can expect some really great stuff around this... >Dall-e often doesn’t even understand negation, let alone complex spatial relations in combination with color assignments to objects. Imagine the new version will. DALLE and MJ are also pipelines, you can pretty much do anything accurately with pipelines now. >Nice. Is it an open-source / open-parameters / open-data model? Like prior SD models it will be open source/parameters after the feedback and improvement phase. We are open data for our LMs but not other modalities. >Cool!!! What do you mean by good data? Can it directly output videos? If we trained it on video yes, it is very much like the arch of sora. |
Very interesting. I've been streching my 12GB 3060 as far as I can; it's exciting that smaller hardware is still usable even with modern improvements.