Hacker News new | ask | show | jobs
by Lerc 830 days ago
Is the lack of training data the only thing preventing this approach from being applied to both positive and negative prompts together?

What size data set is actually needed? Does it need to be machine generated or can you get away with something smaller, perhaps crowdsourced?

1 comments

You could definitely use this for upsampling negative prompts, though I haven't tested that much. In theory, future T2I models shouldn't need to be negatively prompted as much; I find it's better to focus on really high quality positive prompts, as that is closer to the captions the model was trained on.

You can take a look at the dataset here: https://huggingface.co/datasets/roborovski/upsampled-prompts... Roughly 5k samples were needed for the smaller ones at a minimum, filtered from the 95k total generated.