|
|
|
|
|
by iamaaditya
3727 days ago
|
|
hi syllogism Right now everyone doing this is highly focussed on the competition and trying to beat the numbers. For that purpose certainly they would want to stick to predicting Top K answers. For e.g see this table Model Q+I [1] Q+I+C [1] ATT 1000 ATT Full
ACC. 0.2678 0.2939 0.4838 0.4651
Where ATT full is when using all the words, it performs worse than ATT 1000 (Source)[Chen, K., Wang, J., Chen, L. C., Gao, H., Xu, W., & Nevatia, R. (2015). ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering. arXiv preprint arXiv:1511.05960.]Once the competition is over (in rough two months), there will be more focus on actual AI part, where generating the answers would be the right thing to do. There are other papers where they use external knowledge base like DBPedia, certainly "answer word" could be picked up from there. What you have suggested is a very interesting approach, and I am not aware of any paper which has tried that. Certainly quite a few paper have tried to extend NLP QA to Visual QA but with limited success (expect Metamind people). I will certainly keep that in my ideas to try list. I will update you if I get some results. P.S: Thank you for creating Spacy, I love it and I use it everyday ! |
|