| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by MyFirstSass 557 days ago

Wow this is bad. And by bad i mean worse than leading open source and existing alternatives.

Is it me or does it seem like OpenAI revolutionized with both chatGPT and Sora, but they've completely hit the ceiling?

Honestly a bit surprised it happened so fast!

10 comments

lanthissa 556 days ago

I think we're in the snapdragon age of AI for the next little bit, if you were around for early smartphones.

Each company would either rush to get a phone out with the new snapdragon chip, or take their time to polish a release and have a better phone late cycle. But the real improvements we're just the chip.

Nvidia chips/larger data centers are the chips. the models are the plethora of android phones each generation.

That kept going until progress stabilized. Then the best user experience & vertical integration won over chasing chip performance (apple).

link

tom1337 557 days ago

Same goes with DALLE. It was cool to try it the first week or so but now the output is so much worse than Midjourney and stable diffusion. For me it can’t even generate straight lines and everything looks comic-ish.

link

vunderba 557 days ago

DALL-E 3 image quality has always been subpar, but its prompt adherence is on par with FLUX. Midjourney has some of the worst prompt adherence, but some of the best image quality.

link

CamperBob2 556 days ago

DALL-E 3 image quality was absolutely amazing... for about 3 days. Then they must have panicked, because after that, everything it emitted included that ridiculous telltale orange/blue tint.

link

amzn-throw 556 days ago

To me this is just a simple artifact of size & attention.

Another example of this is stuff like Bluesky. There's a lot of reasons to hate Twitter/X, but people going "Wow, Bluesky is so amazing, there's no ads and it's so much less toxic!" aren't complimenting Bluesky, they're just noting that it's smaller, has less attention, and so they don't have ads or the toxic masses YET.

GenAI image generation is an obvious vector for all sorts of problems, from copyrighted material, to real life people, to porn, and so on. OpenAI and Google have to be extraordinarily strict about this due to all the attention on them, and so end up locking down artistic expression dramatically.

Midjourney and Stable Diffision may have equal stature amongst tech people, but in the public sphere they're unknowns. So they can get away with more risk.

link

Liquix 556 days ago

>OpenAI and Google have to be extraordinarily strict

Why? Did the inventors of VHS tapes "have to be extraordinarily strict" and bake in safeguards because people might violate copyright laws, make porn, or tape something illegal?

Enforcing laws is the responsibility of the legal system. It sets a concerning precedent when companies like OAI would rather lobotomize their flagship products than risk them generating any Wrongthink.

link

lacoolj 557 days ago

If you're going to say something like this, you need to back it up with specific alternatives that provide a better result.

Besides just citing your sources, I'm genuinely curious what the best ones are for this so I can see the competition :)

link

echelon 557 days ago

HunYuan released by Tencent [1] is much better than Sora. It's 100% open source, is compatible with fine tuning, ComfyUI, control nets, and is receiving lots of active development.

That's not the only open video model, either. Lightricks' LTX, Genmo's Mochi, and Black Forest Labs' upcoming models will all be open source video foundation models.

Sora is commoditized like Dall-E at this point.

Video will be dominated by players like Flux and Stable Diffusion.

[1] https://github.com/Tencent/HunyuanVideo/

link

vlovich123 557 days ago

Something being available OSS is very different from a turnkey product solution, not to mention that Tencent's 60 GiB requirement requires a setup with like at least 3-4 GPUs which is quite rare & fairly expensive vs something time-sharing like Sora where you pay a relatively small amount per video.

I think the important thing is task quality and I haven't seen any evaluations of that yet.

link

echelon 556 days ago

> Something being available OSS is very different from a turnkey product solution, not to mention that Tencent's 60 GiB requirement requires a setup with like at least 3-4 GPUs which is quite rare & fairly expensive vs something time-sharing like Sora where you pay a relatively small amount per video.

It took two weeks to go from Mochi running on 8xH100s to running on 3090s. I don't think you appreciate the rapidity at which open source moves in this space.

HunYuan landed less than one week ago with just one modality (text-to-video), and it's already got LoRA training and fine tuning code, Comfy nodes, and control nets. Their roadmap is technically impressive and has many more control levers in scope.

I don't think you realize how "commodity" these models are and how closed off "turn key solutions" quickly get out-innovated by the wider ecosystem: nobody talks about or uses Dall-E to any extent anymore. It's all about open models like Flux and Stable Diffusion.

{Text/Image/Video}-to-Video is an inadequate modality for creative work anyway, and OpenAI is already behind on pairing other types of input with their models. This is something that the open ecosystem is excelling at. We have perfect syncing to dance choreography, music reactive textures, and character consistency. Sora has none of that and will likely never have those things.

> something time-sharing like Sora where you pay a relatively small amount per video.

Creators would prefer to run all of this on their own machines rather than pay for hosted SaaS that costs them thousands of dollars.

And for those that do prefer SaaS, there are abundant solutions for running hosted Comfy and a constellation of other models as on-demand.

link

SamPatt 556 days ago

If you've got a 4090 and ComfyUI can you run HunYuan?

link

bildung 556 days ago

There are already Hunyuan fp8 examples running on a 4090 on r/stablediffusion.

link

satvikpendem 556 days ago

RunwayML too but not sure they also won't get commoditized by OSS video generation.

link

tshaddox 557 days ago

What are the leading alternatives? (Open source or otherwise)

link

vunderba 557 days ago

You have to be specific. What's more important to you?

- uncensored output (SD + LoRa)

- Overall speed of generation (midjourney)

- Image quality (probably midjourney, or an SDXL checkpoint + upscaler)

- Prompt adherence (flux, DALL-E 3)

EDIT: This is strictly around image generation. The main video competitors are Kling, Hailuo, and Runway.

link

sebazzz 557 days ago

SD does not generate video, does it?

link

xvector 556 days ago

https://stable-diffusion-art.com/animatediff/

link

CryptoBanker 556 days ago

It does as of recently.

link

amrrs 557 days ago

Minimax (from China) and Kling 1.5 from China. Recently Tencent launched its own.

You can see more model samples heee https://youtu.be/bCAV_9O1ioc

link

ztratar 557 days ago

Those look... far worse? What am I missing.

link

amrrs 557 days ago

Exactly I don't know how people are saying SORA is bad. I know there are restrictions with humans. But with the storyboard and other customisations, it's definitely up there!

link

stuckkeys 557 days ago

FLUX

link

elorant 557 days ago

MidJourney (commercial), Standard Diffusion XL

link

aruametello 557 days ago

> Standard Diffusion XL

you probably meant Stable Diffusion XL. (autocorrect victim)

link

kranke155 557 days ago

Sora was not really that big of a revolution, it was just catching up with competitors. I would even say in gen video they are behind right now.

link

SV_BubbleTime 557 days ago

Sora had some sweet cherry picked initial hype videos. That was more impressive than anything we could do at the time. Now, yea, it's questionable if it's on-par let alone better.

link

kranke155 556 days ago

Wasn't just cherry picked. The balloon kid video had a VFX team cleaning up the output. They've said that now.

link

pawelduda 557 days ago

What is the best model in your opinion right now?

link

kranke155 556 days ago

There are a lot of them, but Runway seems to have good controls and they are aligned with people who will actually use it - filmmakers and content creators.

In terms of image quality. Runway, Luma, and a few of the Chinese models all give "ok" results. I haven't seen anything from Sora to convince me they have done any kind of significant leap.

The issue there is alignment. It's cheap for Runway or Luma to continue in this path since it's their only path to profitability, they do nothing else.

But for OpenAI, I don't think this is near their top list of priorities. I doubt that they will be able to keep adding features like their competitors. Seems to me like this is the equivalent of a side project for them.

edit after watching direct comparison videos, I've changed my mind. Sora is ahead.

link

kranke155 556 days ago

UPDATE: After watching direct comparison videos between prompts, I do think now that Sora is ahead. It's not a huge leap but it seems much better at keeping fine details roughly aligned.

For anyone who is curious where to find tons of SORA videos, go to reddit r/aivideo

link

echelon 557 days ago

HunYuan by Tencent. It's 100% open source too.

link

ElectroNomad 557 days ago

RunwayML

link

joe_the_user 557 days ago

Bad also in the sense once you get over the "boy, it's amazing they can do that", you immediately think "boy, they really shouldn't do that".

link

torginus 556 days ago

My working theory is that OpenAI is the 'moonshot' kind of company full of super smart researchers who like tackling hard problems, but have no time and effort for things like 'how do we create an UX people actually want to use', which actually requires a ton of painful back-and-forth and thoughtful design work.

This is not a problem as long as they do the ChatGPT thing, and sell an API and let others figure out how to build an UX around it, but here they seem to be gunning for creating a boxed product.

link

doctorpangloss 556 days ago

Yeah… they have defined the UX that everyone else is copying thus far. So I feel like you are pretty far off the mark.

link

shadowerm 556 days ago

No doubt. I was waiting so long for Sora but Runway already burned me out on AI video.

It was fun for a few days but far more limited than I would have ever expected.

Maybe Sora 5.0 will be something special. Right now though all these video models are basically shit.

link

Banditoz 557 days ago

What are some of the open source video models?

link

wslh 557 days ago

Could it be that text sources are plenty, and more dense than training for videos, and images?

link