Hacker News new | ask | show | jobs
by _Nat_ 986 days ago
> Wouldn’t something like isMale*P(male=.66) work fine?

It doesn't think like that.

If it did, they could've just done `P(hasFiveFingersPerHand)=0.99999`.

But it doesn't even necessarily draw what you ask it to. Instead, it generally adopts a set of de-noising transforms that it's been trained to believe would tend to lead to what the prompt sounds like.. then whatever those transforms produce would, hopefully, be sorta like what was requested.

1 comments

Custom loss functions absolutely work and work basically the way described above.

https://colab.research.google.com/drive/1dlgggNa5Mz8sEAGU0wF...

You can see them define a custom color loss and apply it simultaneously with the regular diffusion loss. I've actually expanded this notebook to allow regional specification of the custom loss.

It's quite difficult to define a function that detects if an individual has 5 fingers or not. That's the real issue.

The comment I'd responded to seemed to have thought that StableDiffusion picked what the sex of a person would be according to some internal odds that could be modified.

My point was that it doesn't actually think like that. For example, prompting StableDiffusion for a picture of a doctor doesn't necessarily get it to draw a human at all, much less a doctor of a pre-determined sex; instead, StableDiffusion de-noises the image until the result emerges, where that result would (ideally) contain a doctor of whatever sex it happened to come up with.

That said, you're right that we can add more code to try to guide things.

We could even just brute-force it by just re-generating images over-and-over, or tweaking them after generation, until they match exactly what we wanted. (Realistically, something like branch-and-bound would probably be preferred to blindly guess-and-check-ing.)

My point was more that you can add these guardrails without having to keep track of what the model had previously generated.

And I think if you used a perfectly balanced dataset for training, you’d get these guardrails for free because the right probabilities would be baked into the model’s weights.

Yeah, the idea to use random-selection instead of keeping track of generation-history seems reasonable. The idea of guardrails from perfect-balancing seems less obvious to me.

For example, say someone wants to generate a "US President" -- what would the ideal range of outputs be?

The article checked for just two things: sex (male or female) and skin-tone (I, II, III, IV, V, or VI). To date, all US Presidents have been male, and they were probably mostly skin-tones I or II (not bothering to check), except for Obama who was probably.. like IV or something (still not bothering to check).

So if we run StableDiffusion for a "US President", what would a "perfectly balanced" output look like? Should there be any women? What about the skin-tone distribution?

Also, Obama was a 2-term President, so.. if his skin-tone should somehow affect the distribution, should it have a stronger effect because he was in office for longer than average? Or should all US Presidents have the same effect regardless of their time in office? And either way, why?