Hacker News new | ask | show | jobs
by syllogism 3728 days ago
Thanks for your work on this, I find the VQA task really interesting.

The classification-based approach is definitely the part I find unsatisfying about this task. The problem to me is that it biases the models learned very strongly towards the data that was collected for training and testing.

Has anyone tried outputting a vector from the model, and using cosine to predict the nearest word/phrase/sentence etc? This seems to work for non-visual QA.[1] Training is performed using noise contrastive estimation. I've discussed this idea with the Virginia Tech team, but I haven't had time to try it, and they seemed a little skeptical.[2]

[1] https://cs.umd.edu/~miyyer/qblearn/

[2] https://github.com/VT-vision-lab/VQA_LSTM_CNN/issues/14

1 comments

hi syllogism

Right now everyone doing this is highly focussed on the competition and trying to beat the numbers. For that purpose certainly they would want to stick to predicting Top K answers.

For e.g see this table

  Model 	Q+I [1]	 Q+I+C [1] 	ATT 1000 	ATT Full

  ACC. 	0.2678 	0.2939 		0.4838 		0.4651
Where ATT full is when using all the words, it performs worse than ATT 1000 (Source)[Chen, K., Wang, J., Chen, L. C., Gao, H., Xu, W., & Nevatia, R. (2015). ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering. arXiv preprint arXiv:1511.05960.]

Once the competition is over (in rough two months), there will be more focus on actual AI part, where generating the answers would be the right thing to do. There are other papers where they use external knowledge base like DBPedia, certainly "answer word" could be picked up from there.

What you have suggested is a very interesting approach, and I am not aware of any paper which has tried that. Certainly quite a few paper have tried to extend NLP QA to Visual QA but with limited success (expect Metamind people). I will certainly keep that in my ideas to try list. I will update you if I get some results.

P.S: Thank you for creating Spacy, I love it and I use it everyday !