|
|
|
|
|
by Imnimo
1939 days ago
|
|
Yeah, I mean it's definitely the case that the signal here is that words occur in captions for pictures of text as for pictures of objects. But if you do this same experiment of visualizing dataset examples for other networks (like ones trained on Imagenet), you don't find this sort of multimodal neuron. |
|