| I don’t get the criticism here. Normally I’d be the first to err on the side of skepticism, but this work seems above board. I think the confusion is that this model is generating “teddy bear” internally, not a photo of a teddy bear. I.e. the diffusion part was added for flair, not to generate the details of the images that exist inside your mind. They could just as easily have run print(“teddy bear”), but they’re sending it to diffusion instead of printing it to console. The fact that it can correctly discern between a dozen different outputs is pretty remarkable. And that’s all that this is showing. But that’s enough. It’s not really a “gotcha” to say that it’s showing an image from the training set. They could have replaced diffusion with showing a static image of a teddy bear. It sounds like this is many readers’ first time confronting the fact that scientists need to do these kinds of projects to get funding. As long as they’re not being intentionally deceptive, it seems fine. There’s a line between this and that ridiculous “rat brain flies plane” myth, and this seems above it. Disclaimer: I should probably read the paper in detail before posting this, but the criticism of “the building looks like a training image” is mostly what I’m responding to. There are only so many topics one can think about, and having a machine draw a dog when I’m thinking about my dog Pip is some next-level sci-fi “we live in the future” stuff. Even if it doesn’t look like Pip, does it really matter? Besides, it’s a matter of time till they correlate which parts of the brain are more prone to activating for specific details of the image you’re thinking about. Getting pose and color right would go a long way. So this is a resolution problem; we need more accurate brain sampling techniques, i.e. Neuralink. Then I’m sure diffusion will get a lot more of those details correct. |
Even if we do a massive goalpost-move and grant that the system is only identifying the label "dog" with a brain scan of a person looking at a dog, we would need to see actual statistics of its labelling accuracy before judging it in that way. If the images in the paper are cherry-picked(1), it could easily be only able to extract a handful of bits to no bits at all, and the entire thing could very well turn out the be replicable from random noise.
(1) Note that the paper even states "We generated five images for each test image and selected the generated images with highest PSMs [perceptual similarity metrics].", so it even directly admits that the presented images are cherry-picked at least once.