Hacker News new | ask | show | jobs
by doctorpangloss 667 days ago
> CLIP is just for an embedding for images and text, right?

Yes, which is what makes text-to-image generation possible. You can go ahead and try using Stable Diffusion models, or even the incredibly high quality Flux, with no text "embedding" (or whatever you want to call it), and judge for yourself if those outputs are useful.

1 comments

I get that, but my question is, “how can using the guidance from CLIP possibly make the resulting image infringe on copyright?”. I’m not saying that the CLIP part isn’t necessary for it to be useful.