Hacker News new | ask | show | jobs
by cwkoss 1037 days ago
Very cool. Would be interesting to train a model on images with alpha channels so outputs would be automatically masked and more easily composable. But maybe masking is so good these days that would be futile?

When a user does img-2-img on a layer does it use the context from other visible layers in the generation?

3 comments

For composing this approach works pretty well, maybe the author should consider making a UI for it

https://multidiffusion.github.io/

Thanks for posting. Really interesting
Segment Anything is neat, but segmentation is far from solved.

If the user generates a picture of a horse and rider to add onto another composition - they probably want to include the saddle.

SAM is also conditioned on points, if it's ambiguous what you want to mask you can add a point on the saddle and the model will add it without a problem, segmentation is pretty much solved, I agree with the parent post.
IME I haven't gotten great results using SAM, maybe it was just the images I was using? They weren't great quality and it seemed to struggle with low contrast areas
If it's audio, images, cg or video it's almost always GiGo.
> Would be interesting to train a model on images with alpha channels

Would be even more interesting to get an ANN middle system of ontology of the (finally) represented content in order to change the single items.

An internal representation of qualified structured items in space as part of the chain. Prompt > accessible internal representation > render.