|
|
|
|
|
by keithyjohnson
2461 days ago
|
|
Understanding a sentence is fundamentally different from recognizing an object. But people are trying to use deep learning to do both. I agree with most of the article but I think this^^ skips over the different types of networks used to solve perception and language problems. A CNN is very different from say, word2vec, which isn't a very deep network at all. |
|
It absolutely makes sense to use deep learning for both of these tasks.
In fact, one very effective thing to do is to use a Siamese network to learn joint representational spaces of text and imagery in the same network.
It’s really specious and disingenuous to say “boy, vision and language sure seem different but can you believe these DL researchers are using the same tools for both!?”