Hacker News new | ask | show | jobs
by Jevon23 1375 days ago
Well, we’ll see how it performs, if it’s ever made public.

The 20B images don’t look that much more impressive than what SD is already doing (aside from the ability to render text), and in some cases they look worse. It’s hard to tell because the resolution is so small, but even in the 20B “astronaut riding a horse through a pond” image, it looks like his hands are still nonsensical.

1 comments

This nitpick about hands sounds desperate. Here we are, with a tech so powerful that it overshadows the default hype it's surrounded by (no small feat, most technologies fail to live up to the hype as you know) ... and the critics merely move the goalpost a tiny bit further, even if the tech scales so well as to make their new goalpost irrelevant in a year.
It's not a nitpick. It might be a nitpick if hands were the only thing it couldn't do. But it struggles with a lot more than just hands.

>the tech scales so well as to make their new goalpost irrelevant in a year.

This just brings me back to my original question. Self-driving cars have been "a year away" for many years now, and now companies are starting to hint that human assistance may be required for the foreseeable future [1]. So, why the confidence that art will be an easy problem to solve with just more scaling, when that approach hasn't eliminated the need for humans in any other domain?

[1]https://www.reuters.com/technology/truly-autonomous-cars-may...

I have a suspicion that generative art is going to hit a data wall, also. All of these models are constrained in what patterns they can learn because image captions are not very precise. They can rehash common motifs associated with keywords, but they’re not good at following specific instructions. (“The chair is at the corner of the rug, turned 15 degrees to the left, with the leg nearest the camera aligned with the edge of the fireplace.”) For them to meaningfully improve in this regard, I have to imagine someone will need to locate a trove of a few billion images with exceptionally high quality captions, and well distributed throughout the space of possible image types, subjects, themes, and styles.
I think that details like angle and position will be resolved by using basic sketches as a starting point (we can already make images that sort of conform to layouts as well as prompts), and subdividing the image into assets it then has to stitch together in subsequent steps, and then adjusting lighting/contrast/style as a set of filters in post processing. The wall is lowered quite a bit when you don't insist on doing everything from a single magic prompt

(This will be great from the point of view of art creation; not so great from the point of view of supposedly rendering humans obsolete)

That makes sense. I don’t think that will render humans obsolete; I think it will just increase their productivity and ultimately raise the standard of quality expected. It means artists can explore and iterate on ideas faster than if they had to lay down preliminary artifacts manually. But it doesn’t eliminate the need for authorship: someone still needs to decide what to communicate visually and how to communicate it.
Sure, but eventually we’re going to hit on environmental and cost-effective power limits of training, and it’s not worth the cost to train the model.

AFAIUI, that’s part of the point that Gebru was trying to make before she was fired.

For now I can run my stable diffusion on a vintage laptop from a decade ago, on CPU (!), and it doesn't even utilize most of my RAM. And training this model was still cheap compared to, say, a google senior engineer yearly salary. The limits of scaling are further than laymen may want to believe.

With an order of magnitude more parameters it won't just do hands, it will do quite a bit more.

Gebru doesn't have anything good to say about AI, it's only downsides. She's biased being an activist and all.
Her position doesn’t invalidate her arguments. How good is a product that can’t stand up to criticism?

Edit: to tie it back to the original criticism: what’s the maximum training cost we’re willing to accept for the model? How can we guarantee return greater than the increased training cost?

This makes me wonder what the costs of training an actual human artist are. Are AIs less efficient?
You don't have to retrain an AI to spin up another instance - just download a 4gb weight file via a magnet link floating around the internet and run some python code in terminal on your old PC. This kills the comparison.

And training a real living breathing person in a rich OECD country is going to be costly - no offense meant, I'm actually not from OECD.

You can create a lot more copies of the AI.

But then again, when it comes to commercial needs, a human doesn't need "retraining" every time you ask them to draw something they weren't familiar with when they went through art school...