|
|
|
|
|
by chromanoid
1935 days ago
|
|
I agree. But isn't it more probable that there forms some kind of arbitrary "OR" logic than "real abstraction" which is indicated by choosing the word "multi-modal". I guess we see something like this: e.g. "Photo of spider" -> Hierarchy of pixel soups -> "Photo of spider" OR "Photo of word spider" OR "Spider rear view" OR "Spiderman" OR ... -> [Spider]
What I think the authors want to tell me when calling it multi-modal: "Photo of spider" -> "Characteristics of a spider" -> [Spider]
"Photo of word spider" -> "Letters S-P-I-D-E-R" -> "Written word spider" -> [Spider]
|
|
One is just basic generalisation - do these neurons effectively capture things within their semantic group (whatever that means) but completely outside of the training data. If yes, then I guess the answer might be 'yes, in some sense it is like a "real abstraction"'.
Second, and (afaik) currently a more philosophical framing - it isn't obvious whether "(sufficiently advanced) OR logic" and "real abstraction", are actually different. Additionally, for the purposes of a model like this one, I find it hard to see how they could be different. The best the model can do is (roughly speaking) assign neurons to particular concepts, be they ones that fit with our mental models of the world, or ones that are more "functional". The better a job it can do of the former, the more we might be inclined to believe that it is modelling things as we understand them.