Hacker News new | ask | show | jobs
by Jevon23 1377 days ago
>It's going to destroy the livelihoods of the majority of independent artists in a way that looks inevitable to me.

Why is SD going to destroy the livelihoods of artists when machine language translation hasn't put human translators out of work yet?

I don't think there's been any industry that's been ended by AI yet, and yet people are strangely confident that art is going to be the first.

7 comments

Some art, technical and scientific illustration in particular, requires a great deal of precision, and ability to interpret information. That work isn't going away any time soon, and is similar to what is required of professional translation. A lot of art does not require that.
Are you under the impression that right now, as of today, the publicly-available AI models are ready to replace humans for all types of art outside of scientific and technical illustration?

Because that's not true at all. The AI can't even draw hands yet. To say nothing of its ability to handle multiple people and objects interacting in complex scenes.

I'm concerned that the discussions about AI art on forums like HN get distorted because you have people sharing their views on art here, even though they don't actually have a serious and nuanced appreciation of art and they don't have a good understanding of all the types of work that artists do. Maybe you'd be fine with reading a comic book where everyone has seven melting fingers, but people who take comic books seriously as an artistic medium would not.

> Because that's not true at all. The AI can't even draw hands yet. To say nothing of its ability to handle multiple people and objects interacting in complex scenes.

This seems to be purely an issue of the size of the network. Parti (https://parti.research.google/) demonstrates that as the number of parameters increases, with no change to the underlying architecture, a lot of these problems simply go away. Basically just throw more compute and memory at the problem and everything gets fixed.

People who buy art do not buy it because of the technical execution. You may need to execute a piece in some way to get a desired effect, but the technique is the mean not the goal.

This is not to take away from the achievements of AI. It's that creating pictures adhering to a prompt with some degree of creativity is very little of what art is. Maybe it will replace some part of commissioned illustrations where the artist's name does not matter (e.g. some avatar pic?).

We still value, financially, some material goods for much more than they cost to produce. Or for much more than their almost identical mass-produced counterparts.

> It's that creating pictures adhering to a prompt with some degree of creativity is very little of what art is.

I mean, it's the majority of commercial art - you get a prompt from the client, you maybe flesh it out in a few different directions with sketches, then you refine a final piece. And AI is incredibly good at this process - instant results, infinite patience, and it's free. A very hard combination to meet.

Calendars, book covers, video game assets, green screen backgrounds....

Even in a case like video game animations, where the AI can't build every frame, it can still give you a good reference photo. From there you just need a cheap artist to fill out the frames - a huge cost savings, and a big blow to the artistic community.

Where do you get started as an artist, without any of those? Obviously, Fine Arts isn't nearly as effected, but how do you get your start when you can't build a name from your cool book covers, or get famous off Magic: The Gathering card illustrations?

Well, we’ll see how it performs, if it’s ever made public.

The 20B images don’t look that much more impressive than what SD is already doing (aside from the ability to render text), and in some cases they look worse. It’s hard to tell because the resolution is so small, but even in the 20B “astronaut riding a horse through a pond” image, it looks like his hands are still nonsensical.

This nitpick about hands sounds desperate. Here we are, with a tech so powerful that it overshadows the default hype it's surrounded by (no small feat, most technologies fail to live up to the hype as you know) ... and the critics merely move the goalpost a tiny bit further, even if the tech scales so well as to make their new goalpost irrelevant in a year.
It's not a nitpick. It might be a nitpick if hands were the only thing it couldn't do. But it struggles with a lot more than just hands.

>the tech scales so well as to make their new goalpost irrelevant in a year.

This just brings me back to my original question. Self-driving cars have been "a year away" for many years now, and now companies are starting to hint that human assistance may be required for the foreseeable future [1]. So, why the confidence that art will be an easy problem to solve with just more scaling, when that approach hasn't eliminated the need for humans in any other domain?

[1]https://www.reuters.com/technology/truly-autonomous-cars-may...

Sure, but eventually we’re going to hit on environmental and cost-effective power limits of training, and it’s not worth the cost to train the model.

AFAIUI, that’s part of the point that Gebru was trying to make before she was fired.

Doesn't "the AI" train on art produced by people? "Just expand the dataset, just increase the parameters" seems like it should hit a wall fairly quickly... and still not be very good, because deep learning systems have no insight.
Every instagram, facebook, and tiktok photo with associated text data is a potential pair for training.

In the smartphone age, the case for data hunger looks pretty weak.

But not as weak as the case that the route to production grade commercial art is reached via biasing the training dataset more towards sloppy social media images...
> I don't think there's been any industry that's been ended by AI yet, and yet people are strangely confident that art is going to be the first.

Technology is making something that used to take a lot of practice and skill be accesible to those without any of it. A monkey can now draw two ovals, label it an owl, and run an image-to-image conversion with Stable Diffusion to get a pretty good sketch of an owl [1].

Is it better than what a good artist could do? Irrelevant.

Is it better than what a cheap illustrator I find on Fiverr could do? Irrelevant.

The only important point is that I no longer need an illustrator to get myself an owl. I draw some lines, I pick some words, and presto I have an illustration.

The question of whether it's "art" is entirely irrelevant.

> Are you under the impression that right now, as of today, the publicly-available AI models are ready to replace humans for all types of art outside of scientific and technical illustration? Because that's not true at all. The AI can't even draw hands yet. To say nothing of its ability to handle multiple people and objects interacting in complex scenes.

I think this is severely underplaying the speed at which things are changing and basing an argument about things that the AI currently can't do. DALL-E was anounce in Jan 2021 and it's still locked behind API access. Stable Diffusion came out Aug 2022 and I can run it on <$2,000 laptop. That's not 2 years. Do you think hands are going to be a long term roadblock?

As for complex scenes, you can currently string that together with a Stable Diffusion plugin for photoshop/gimp.

[1] https://www.reddit.com/r/StableDiffusion/comments/wwv7zk/sta...

But if I want a good picture of an owl, I Google "owl" and get many more options than I could possibly ever have time to pick from. Stable Diffusion is essentially doing the same thing as Google, except presenting a kind of average result instead of showing me all the results in its DB.

Now, this may actually be helpful in that it gets around copyright claims - but that's the only real difference.

And you are free to search through the whole catalog of google results until you find an owl that looks exactly like you want. Though this is going to get harder as you want something more specific than a simple owl.

But the approach for stable diffusion is just as easy whether you want just "an owl", or "an owl in X's style with A, B, and C"

Changing the prompt until it generates what I want is not that different from changing my search terms until the result I want is closer to the top.

Now, I should of course note that search engines already employ ML techniques to actually interpret search terms, so to some extent the point is moot - ML is important to actually solving this problem.

Go ahead and get me a photo off Google images of an alpaca in a suit playing chess in vibrant digital painting style.

Without meaning to sounding rude about it... I'll wait.

I'd be curious to see if you could get that from the AI as well.

I tried generating that exact prompt a few times at theartbutton.ai and all the results were nonsensical.

For example: https://theartbutton.ai/image/OW1HZLfhjg6DFvJtk4vQZzUYqI7pGG...

> they don't actually have a serious and nuanced appreciation of art and they don't have a good understanding of all the types of work that artists do.

I would extend this lack of nuance and understanding to the deep learning implementation side also. A lot of people seem to have some very foundational misconceptions about what deep learning is and what it does. In the case of generative art: these models are “simply” sampling from the frozen statistical structure they have learned from web images and their captions. They don’t understand the relationship between objects in space, they have no ideas or feelings to express, and they communicate nothing. That’s why the even the best output of these models tends to have a perceptible hollowness: you can detect the lack of a coherent authorial intent in it.

I recently went to stable diffusion for some art for a D&D campaign guide, to make the thing more immersive. While the pictures are impressive, there are a lot of things about the generated art that just don't make sense: In one picture, a tower had a staircase down 1/3rd of the way from the door to the ground, just stopping at that point. Most had issues like this. Several other pictures I wanted were impossible to generate.

The field of "art that needs human communication skills" seems to be a lot broader than just scientific illustration.

A significant chunk of what you're describing can be solved by a combination of better prompt engineering and repeated inpainting.

SD obviously doesn't understand language in the same way we do, so it can be tricky to describe things in a way that will match your expectations. Once you start to understand the tricks here, it gets easier and easier.

Inpainting will let you fix a lot of the rest. Staircase stops? Select the area where it stopped, get the AI to generate more. People are already doing this to create very complex artwork where there are issues with faces, hands, etc. https://www.reddit.com/r/StableDiffusion/comments/x9u8qh/img... is a great example of how you can quickly iterate over a scene.

One of the other things people struggle with is consistent characters and settings, but people have found ways to improve this with Midjourney - https://docs.google.com/document/u/1/d/e/2PACX-1vRahIr3-h_V3...

There's more of a learning curve to these tools than most people think, but it's also still miles and miles away from the learning curve required to actually be proficient at the technical aspects of making art.

> machine language translation hasn't put human translators out of work yet?

It's not clear if machine language translation may have:

- Increased "demand" for translation by reducing the price of "basic" translation. - Increased overall globalization -- I use machine translation to communicate with contract manufacturers in China, whereas without it I might avoid using contract manufacturers in China. - Due to increased globalization, increased demand for occasional "advanced" translation via professional translators or bilingual.

So perhaps it does greatly decrease or eliminate the large slice of the pie which reflects translation jobs that would exist at the low end while simultaneously greatly increasing the overall size of the pie.

I think it’s highly likely that machine translation will still put human translators out of work. We’re probably very close to the point where machine translation is about as good as an average human translator.
My guess is that something like 90% of translation - e.g product manuals, websites etc are now machine translated. Order a random item from China and take a look at their product manual - it's very likely that it's machine translated and not by a person.
> Order a random item from China and take a look at their product manual - it's very likely that it's machine translated and not by a person.

Yes and with luck, at least one language you know how to read will make sense and not be off-the-wall bonkers or simply incomprehensible.

Machine translation has put a lot of human translators out of work. The per-word rates for text translation are pathetic these days, even for language pairs that Google Translate struggles with.

Of course, there are still some jobs for high-quality/high-importance translation like legal work, simultaneous translation etc, but these are quite niche.

Just another chapter of the whole automation shtick.

> I don't think there's been any industry that's been ended by AI yet, and yet people are strangely confident that art is going to be the first.

Any time now.

There would be way, way more translators in a highly globalized society like today if it wasn’t for machine translation. Also translation is an exact science for the most part, art isn’t.
I am becoming more and more convinced that many techy folks near the AI scene saw that SD et alia can create an image that convincingly (and perhaps even indistinguishably) looks like a very nice digital painting, and based on that data point alone are calling artists obsolete.
I've played around with Dall-E a bit and based on trying to create weird ideas like an army of toddlers in plate armor riding corgis into a medieval battle or a bear riding a bicycle pulling a semi truck, I'm fully convinced that it's over-blown.

It can recreate data close to what it has already seen, just like all neural network techniques. It does poorly outside that domain.