Hacker News new | ask | show | jobs
by andsens 1387 days ago
Uhm. You’re basically asking how the entire NN works. There is no easy explanation for that.
1 comments

I understand neural networks, embeddings, convolutions, etc. The part that's unclear to me is specifically how textual embeddings are linked into the img-to-img network trying to reduce the noise. In other words, am missing how the process is 'conditioned upon' the text. (I lack a understanding the same for conditional GANs as well.)

If the answer is just that the textual embeddings are also fed as simple inputs to the network, I already understand then.

Might be worth looking through the dataset it was trained on, here's on example: https://laion-aesthetic.datasette.io/laion-aesthetic-6pls/im...

So the model understands (kinda) who Bob Moog is, so when you include "Bob Moog" in the prompt, the model knows what you are looking for.

Why did they unnecassarily re-index a smaller subset of Laion Aesthetic? You can search _all_ of laion using the pre-built faiss indices from laion..

https://rom1504.github.io/clip-retrieval/?back=https%3A%2F%2...

is a hosted version, but you can download and host it yourself as well.