|
|
|
|
|
by RC_ITR
1060 days ago
|
|
> To achieve the goals of retrieval and reconstruction with a single model trained end-to-end, we adopt a novel approach of using two parallel submodules that are specialized for retrieval (using contrastive learning) and reconstruction (using a diffusion prior). What you can think of contrastive learning as is: two separate models that take different inputs and make vectors of the same length as outputs. This is achieved by training both models on pairs of training data (in this case fMRI images and observed images). What the LAION-5B work shows is that they did a good enough job of this training that the models are really good at creating similar vectors for nearly any image and fMRI pair. Then, they make a prior model which basically says “our fMRI vectors are essentially image vectors with an arbitrary amount of randomness in them (representing the difference between the contrastive learning models). Let’s train a model to learn to remove that randomness, then we have image vectors.” So yes, this is an impressive result at first glance and not some overfitting trick. It’s also sort of bread and butter at this point (replace fMRI with “text” and that’s just what Stable Diffusion is). They’ll be lots of these sort of results coming out soon. |
|
You can see the comparison in performance between LAION-5B retrieval and actual reconstructions in the paper. When retrieving from a large enough database like LAION-5B, we can get images that are quite similar to the seen images in terms of high level content, but not so similar in low-level details (relative position of objects, colors, texture, etc). Reconstruction with diffusion models does much better in terms of low-level metrics.