| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cwkoss 1037 days ago
	Very cool. Would be interesting to train a model on images with alpha channels so outputs would be automatically masked and more easily composable. But maybe masking is so good these days that would be futile? When a user does img-2-img on a layer does it use the context from other visible layers in the generation?

3 comments

dheera 1037 days ago

For composing this approach works pretty well, maybe the author should consider making a UI for it

https://multidiffusion.github.io/

link

mottiden 1037 days ago

Thanks for posting. Really interesting

link

Zetobal 1037 days ago

Segmentation is solved... https://github.com/RockeyCoss/Prompt-Segment-Anything

link

michaelt 1037 days ago

Segment Anything is neat, but segmentation is far from solved.

If the user generates a picture of a horse and rider to add onto another composition - they probably want to include the saddle.

link

GaggiX 1037 days ago

SAM is also conditioned on points, if it's ambiguous what you want to mask you can add a point on the saddle and the model will add it without a problem, segmentation is pretty much solved, I agree with the parent post.

link

bavell 1037 days ago

IME I haven't gotten great results using SAM, maybe it was just the images I was using? They weren't great quality and it seemed to struggle with low contrast areas

link

Zetobal 1036 days ago

If it's audio, images, cg or video it's almost always GiGo.

link

mdp2021 1037 days ago

> Would be interesting to train a model on images with alpha channels

Would be even more interesting to get an ANN middle system of ontology of the (finally) represented content in order to change the single items.

An internal representation of qualified structured items in space as part of the chain. Prompt > accessible internal representation > render.

link