| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Hakkin 1247 days ago
	I'm definitely not an expert in this subject, but even if the model is overfitted, doesn't the fact that it can pull out the similar images at all give credit to the idea that a larger, non-overfitted model could actually work as the paper describes? It means that there does exist some correlation between the shown subject, the captured fMRI data, and the resulting location in latent space.

4 comments

Double_a_92 1247 days ago

The output part is basically nonsense. It would be more honest if the output was a text. E.g. "Teddybear" instead of a bad image of a random teddybear.

link

Hakkin 1247 days ago

In this specific case I agree, since the model may be overfitted, it seems like it's currently just a glorified object classifier based on what was in the training data, but the fact that it works at all may indicate that the underlying idea has merit. They would probably have to train a much larger network to see if it's able to separate features distinctly enough using the input fMRI data to be useful.

link

gus_massa 1247 days ago

The problem is that it's impossible to know what is in the fMRI data and what is hallucinated by the reconstruction.

In this case, the real bear has a blue ribbon and the "reconstructed" bear ha a red ribbon. Is the ribbon in the fMRI data and the computer choose the wrong color, or most of the images in the training set had ribbons and the computer just added one.

Imagine this something like this is used in the future to get something like https://en.wikipedia.org/wiki/Facial_composite . People may give too much importance to the details and arrest someone only because the computer imagined some detail, like the logo in the baseball cap.

link

mkagenius 1247 days ago

> Imagine this something like this is used in the future to get something like https://en.wikipedia.org/wiki/Facial_composite . People may give too much importance to the details and arrest someone only because the computer imagined some detail, like the logo in the baseball cap.

Wow, tech not working to tech might kill someone went super fast here.

link

YeGoblynQueenne 1247 days ago

In the real world when tech doesn't work people die.

OP is right to be concerned. This kind of tech (magickal mind-reading AI?!) is going to be bought up by security agencies, who wiil not understand its limitations and misuse it to accuse people of crimes they aren't related to.

There is ample precedent. Just for one recent example see plans to use an "AI lie detector" based on discredited pseudo-science at EU borders:

https://theintercept.com/2019/07/26/europe-border-control-ai...

link

gus_massa 1247 days ago

Exactly.

For example plead read this old article very carefully: "Police Are Using DNA to Generate 3D Images of Suspects They've Never Seen" https://www.vice.com/en/article/pkgma8/police-are-using-dna-... HN discussion https://news.ycombinator.com/item?id=33527901 (6 points | 3 months ago | 1 comment)

The picture is a high resolution image than make the system look accurate. They don't use the AI buzzword, but my guess it's only a mater of time. Anyway, the important paragraph is

> Seeing the composite image with no context or knowledge of DNA phenotyping, can mislead people into believing that the suspect looks exactly like the DNA profile. “Many members of the public that see this generated image will be unaware that it's a digital approximation, that age, weight, hairstyle, and face shape may be very different, and that accuracy of skin/hair/eye color is approximate,” Schroeder said.

link

moron4hire 1247 days ago

It's not an object classifier at all. They had to text-prompt the system, first. I think the general idea is using the fMRI data as the pseudorandom initialization for the latent diffusion model to explore.

From what I understand, regular Stable Diffusion starts by generating a noise and then hallucinating modifications of that noise to make less noise. The more you let it run, the better the results.

So instead of just starting with a meaningless random noise, they're using the fMRI data to start. But if you didn't have the text prompt, you wouldn't get the right image. If you were looking at a cat but told it you were looking at a house, you'd probably end up with a small house, similar to one in its training set, positioned roughly where the cat was located in the original image.

link

Hakkin 1247 days ago

Briefly reading the paper, it seems they trained 2 models (using data from different stages in the visual cortex) to generate latent vectors for both the visual and textual representations of the fMRA data, then feed those into Stable Diffusion. Those are the models that would be overfit in this case, so instead of those models being able to encode features like "toy, animal, fluffy, brown, ears, nose, arms, legs" individually, it's likely just encoding all of those features combined into a generic "teddy bear" because the input dataset is too small. Obviously this is an oversimplification, but hopefully you get what I mean. I didn't mean it was literally an object classifier, but that the nature of a model like this, with a dataset so small, it does not have to ability to extrapolate fine details. With a larger dataset and more training, it may be able to actually do that.

link

dr_dshiv 1247 days ago

My colleagues did the same, but with EEG. This makes the technique much more accessible: https://arxiv.org/abs/2302.10121

One open question in the field: how to assess the alignment of the AI outcomes across different methods?

link

angusturner 1247 days ago

Largely agree with this, although I think it would be interesting to formulate in terms of: "what is the mutual information between the fMRI scan and the stimulus".

i.e) is there actually more information than a few bits encoding a crude object category, which stable diffusion then hallucinates the rest (/ uses to regurgitate an over-fit image)?

Or are there many bits, corresponding spatially to different regions of the stimulus - allowing for some meaningful degree of generalization.

link

hiddencost 1247 days ago

Nope.

If you train a model where the input is an integer between 1 and 10, and the output is a specific image from a set of ten, the model will be able to get zero loss on the task. That is what's happening here.

link

geysersam 1247 days ago

Yes but the input isn't an integer from 1 to 10 right? It's MRI data.

Although it seems they're only able to extract the subject of the brain activity, not any actual "pictures".

link

darawk 1247 days ago

Are you saying the demonstrated results are all in sample? Because this is definitely not true for out of sample data. And the GP comment implies that there is in fact a validation/holdout set.

link

qumpis 1247 days ago

I'm also confused by this. If everything was done properly, test results on the holdout set would've been shown. Wasnt that the case?

link

radu_floricica 1247 days ago

It's still a legitimate direction to pursue. Once you get to large enough training sets, it's basically the same way our own brains work. We don't perceive or remember all the details of a building - just "building, style 19B", plus a few extra generic parameters like distance, angle, color and so on. Totally manageable for deep learning to recognize, and perhaps even combine.

link

williamcotton 1246 days ago

We performed visual reconstruction from fMRI signals using LDM in three simple steps as follows (Figure 2, middle). The only training required in our method is to construct linear models that map fMRI signals to each LDM component, and no training or fine-tuning of deep-learning models is needed. We used the default parameters of image- to-image and text-to-image codes provided by the authors of LDM 2, including the parameters used for the DDIM sam- pler. See Appendix A for details.

link

csomar 1247 days ago

But unless they tested this on a single human being; doesn't this mean that we can read brains (it's just this one particular reader is bad).

link

lproven 1244 days ago

From the paper, it's four people from the NSD:

« We analyzed data for four of the eight subjects who completed all imaging sessions (subj01, subj02, subj05, and subj07 »

P3 here: https://www.biorxiv.org/content/biorxiv/early/2022/11/21/202...

link

singularity2001 1247 days ago

I am pretty pretty sure that this is just per person. so all it does is categorize complex brain patterns of one person into 10 category numbers and then do some hula hoop to display the numbers.

link

thedudeabides5 1247 days ago

Yes.

It means there may be signal in the noise. Even if it's overfitting. Which makes sense.

A sufficiently granular map of the human brain aught to be readable, if you know what the input and output signals are.

link

chaxor 1247 days ago

If things are being overfit you should typically make the model smaller - not larger.

link