Hacker News new | ask | show | jobs
by sillysaurusx 1199 days ago
I don’t get the criticism here. Normally I’d be the first to err on the side of skepticism, but this work seems above board.

I think the confusion is that this model is generating “teddy bear” internally, not a photo of a teddy bear. I.e. the diffusion part was added for flair, not to generate the details of the images that exist inside your mind. They could just as easily have run print(“teddy bear”), but they’re sending it to diffusion instead of printing it to console.

The fact that it can correctly discern between a dozen different outputs is pretty remarkable. And that’s all that this is showing. But that’s enough.

It’s not really a “gotcha” to say that it’s showing an image from the training set. They could have replaced diffusion with showing a static image of a teddy bear.

It sounds like this is many readers’ first time confronting the fact that scientists need to do these kinds of projects to get funding. As long as they’re not being intentionally deceptive, it seems fine. There’s a line between this and that ridiculous “rat brain flies plane” myth, and this seems above it.

Disclaimer: I should probably read the paper in detail before posting this, but the criticism of “the building looks like a training image” is mostly what I’m responding to. There are only so many topics one can think about, and having a machine draw a dog when I’m thinking about my dog Pip is some next-level sci-fi “we live in the future” stuff. Even if it doesn’t look like Pip, does it really matter?

Besides, it’s a matter of time till they correlate which parts of the brain are more prone to activating for specific details of the image you’re thinking about. Getting pose and color right would go a long way. So this is a resolution problem; we need more accurate brain sampling techniques, i.e. Neuralink. Then I’m sure diffusion will get a lot more of those details correct.

1 comments

Because pretty much everybody that reads the article will have taken away a grossly exaggerated idea of what the system is actually capable of. If Stable Diffusion was intentionally added "for flair" and really is unnecessary, then I would absolutely say that the researchers were being intentionally deceptive.

Even if we do a massive goalpost-move and grant that the system is only identifying the label "dog" with a brain scan of a person looking at a dog, we would need to see actual statistics of its labelling accuracy before judging it in that way. If the images in the paper are cherry-picked(1), it could easily be only able to extract a handful of bits to no bits at all, and the entire thing could very well turn out the be replicable from random noise.

(1) Note that the paper even states "We generated five images for each test image and selected the generated images with highest PSMs [perceptual similarity metrics].", so it even directly admits that the presented images are cherry-picked at least once.

It’s more like this:

We can take fMRI scans when people are looking at images and generate blurry blobs that do indeed resemble the images spatially.

We can predict a text label of the image the person is looking at using another technique.

If you use SD just on the text labels and you generate an image, you get the semantic content, but not the special content.

If you combine the image and the text label and run it through an LDM then you get pictures that more closely match both the semantic and spatial characteristics of the images shown to the person.

That’s my understanding as well. It all depends whether their technique really can do this. If it can, it’s solid work imo. If it can’t (better than random chance), then it’s bunk.

There’s not much way to know other than to try it and see. But that’s true of almost every paper in ML. Some of them suck, some of them are great, but they all contribute something in their own way. Even “rat brain flies plane” paper (as much as I despise it) showed that you can change the values of mice neurons in a lab setting.