|
|
|
|
|
by tmabraham
1060 days ago
|
|
Our model generates CLIP image embeddings from fMRI signals and those image embeddings can be used for retrieval (using cosine similarity for example) or passed into a pretrained diffusion model that takes in CLIP image embeddings and generates an image (it's a bit more complicated than that but that's the gist, read the blog post for more info). So we are doing both reconstruction and retrieval. The reconstruction achieves SOTA results. The retrieval demonstrates that the image embeddings contain fine-grained information, not just saying it's just a picture of a teddy bear and then the diffusion model just generates a random teddy bear picture. I think the zebra example really highlights that. The image embedding generated matches the exact zebra image that was seen by the person. If the model only could say it's just a zebra picture, it wouldn't be able to do that. But the model is picking up on fine-grained info present in the fMRI signal. The blog post has more information and the paper itself has even more information so please check it out! :) |
|