|
|
|
|
|
by Aransentin
1204 days ago
|
|
Because pretty much everybody that reads the article will have taken away a grossly exaggerated idea of what the system is actually capable of. If Stable Diffusion was intentionally added "for flair" and really is unnecessary, then I would absolutely say that the researchers were being intentionally deceptive. Even if we do a massive goalpost-move and grant that the system is only identifying the label "dog" with a brain scan of a person looking at a dog, we would need to see actual statistics of its labelling accuracy before judging it in that way. If the images in the paper are cherry-picked(1), it could easily be only able to extract a handful of bits to no bits at all, and the entire thing could very well turn out the be replicable from random noise. (1) Note that the paper even states "We generated five images for each test image and selected the generated images with highest PSMs [perceptual similarity metrics].", so it even directly admits that the presented images are cherry-picked at least once. |
|
We can take fMRI scans when people are looking at images and generate blurry blobs that do indeed resemble the images spatially.
We can predict a text label of the image the person is looking at using another technique.
If you use SD just on the text labels and you generate an image, you get the semantic content, but not the special content.
If you combine the image and the text label and run it through an LDM then you get pictures that more closely match both the semantic and spatial characteristics of the images shown to the person.