Hacker News new | ask | show | jobs
by vunderba 723 days ago
Nice job. I actually experimented with a chat driven instruct2pix sort of interface that connected via API to a stable diffusion backend. The big problem is that it's difficult to know if the inpainting job you've done is satisfactory to the user.

This is why usually when you're doing this sort of traditional inpainting in automatic1111 you generate several iterations with various mask blurs, whole picture vs only masked section, padding and of course the optimal inpainting checkpoint model to use depends on whether or not the original images is photorealistic versus illustrated, etc.

1 comments

Right now, the inpainting is done on semantic mask (output from segmentation model). For more complex instruction, we also have to support contextual mask generation, which is an active area of research in the field of Visual Language Model. When it comes to perform several iteration, you can also do that on semantic level or get a batch of output. The sdv1.5 inpainting model is quite weak and we haven't seen any large scale open source inpainting model for a while.