| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by LarsDu88 724 days ago

Stable Diffusion Dreambooth.

The basic idea is you find a rare unused token (like say f$#sdafad) and then finetune your image generation model with a specific set of images (say 20 images of your red cap at various angles) while telling it that f$#sdafad is the same thing as your red cap.

Then you can start prompting "f$#sdafad resting on the head of a monkey" and your cap will appear on a monkey's head.

The problem with this technique is the finetuning part. Finetuning can take minutes to hours depending on how many gpus you have and needs to be done individually for every new "token" you want to map to a specific individual or object you want to add to your pre-trained model.

Another strategy is to use some kind of autocropping strategy + generative infill. You can take a semantic segmentation model like Meta's "Segment Anything", then use it to segment out the item of interest manually (perhaps a UI could be built to make this a one-step process). Then take the mask and do a generative infill using some sort of image generation model like stable diffusion.