|
|
|
|
|
by leblancfg
3520 days ago
|
|
I agree. Although one could imagine that concepts conveyed in a photograph could be extracted and abstracted as vectors -- just like word2vec and its successors. Of course, there is a long way to go before we hit "human understanding" parity, but I think ideas from [1], [2] and [3] could be extrapolated in doing just that. [1] Deep Visual-Semantic Alignments for Generating Image Descriptions - cs.stanford.edu/people/karpathy/cvpr2015.pdf [2] Deep Learning for Content-Based Image Retrieval - www.research.larc.smu.edu.sg/mlg/papers/MM14-fp336-hoi.pdf [3] Deep Learning for Content-Based Image Retrieval - www.cs.rutgers.edu/~elgammal/pub/MTA_2014_Saleh.pdf |
|