Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1505.05612
Cited By
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering
21 May 2015
Haoyuan Gao
Junhua Mao
Jie Zhou
Zhiheng Huang
Lei Wang
W. Xu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering"
50 / 75 papers shown
Title
Text-Guided Coarse-to-Fine Fusion Network for Robust Remote Sensing Visual Question Answering
Zhicheng Zhao
Changfu Zhou
Yu Zhang
Chenglong Li
Xiaoliang Ma
Jin Tang
76
0
0
24 Nov 2024
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
Shota Onohara
Atsuyuki Miyai
Yuki Imajuku
Kazuki Egashira
Jeonghun Baek
Xiang Yue
Graham Neubig
Kiyoharu Aizawa
OSLM
103
1
0
22 Oct 2024
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines
Genta Indra Winata
Frederikus Hudi
Patrick Amadeus Irawan
David Anugraha
Rifki Afina Putri
...
Alham Fikri Aji
Taro Watanabe
Derry Wijaya
Alice H. Oh
Chong-Wah Ngo
CoGe
105
9
0
16 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
82
25
0
04 Oct 2024
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering
Jingqun Tang
Qi Liu
Yongjie Ye
Jinghui Lu
Shubo Wei
...
Yanjie Wang
Yuliang Liu
Hao Liu
Xiang Bai
Can Huang
34
22
0
20 May 2024
FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion
Xing Han
Huy Nguyen
Carl Harris
Nhat Ho
S. Saria
MoE
77
16
0
05 Feb 2024
Visual Question Generation in Bengali
Mahmud Hasan
Labiba Islam
J. Ruma
T. Mayeesha
Rashedur Rahman
19
1
0
12 Oct 2023
HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language
Shantipriya Parida
Idris Abdulmumin
Shamsuddeen Hassan Muhammad
Aneesh Bose
Guneet Singh Kohli
I. Ahmad
Ketan Kotwal
S. Sarkar
Ondrej Bojar
Habeebah Adamu Kakudi
22
4
0
28 May 2023
Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering
T. M. Thai
Son T. Luu
32
0
0
22 Mar 2023
BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models
Ali Borji
CoGe
10
1
0
28 Jan 2023
Curriculum Script Distillation for Multilingual Visual Question Answering
Khyathi Raghavi Chandu
A. Geramifard
25
0
0
17 Jan 2023
AlignVE: Visual Entailment Recognition Based on Alignment Relations
Biwei Cao
Jiuxin Cao
Jie Gui
Jiayun Shen
Bo Liu
Lei He
Yuan Yan Tang
James T. Kwok
18
7
0
16 Nov 2022
From Pixels to Objects: Cubic Visual Attention for Visual Question Answering
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Heng Tao Shen
24
62
0
04 Jun 2022
Structured Two-stream Attention Network for Video Question Answering
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
25
68
0
02 Jun 2022
Delving Deeper into Cross-lingual Visual Question Answering
Chen Cecilia Liu
Jonas Pfeiffer
Anna Korhonen
Ivan Vulić
Iryna Gurevych
26
8
0
15 Feb 2022
Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's Progressive Matrices
Mikolaj Malkiñski
Jacek Mañdziuk
120
41
0
28 Jan 2022
ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation
Han Zhang
Weichong Yin
Yewei Fang
Lanxin Li
Boqiang Duan
Zhihua Wu
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
27
58
0
31 Dec 2021
3D Question Answering
Shuquan Ye
Dongdong Chen
Songfang Han
Jing Liao
ViT
24
46
0
15 Dec 2021
Multimodal Dialogue Response Generation
Qingfeng Sun
Yujing Wang
Can Xu
Kai Zheng
Yaming Yang
Huang Hu
Fei Xu
Jessica Zhang
Xiubo Geng
Daxin Jiang
15
43
0
16 Oct 2021
Asking questions on handwritten document collections
Minesh Mathew
Lluís Gómez
Dimosthenis Karatzas
C. V. Jawahar
RALM
20
11
0
02 Oct 2021
xGQA: Cross-Lingual Visual Question Answering
Jonas Pfeiffer
Gregor Geigle
Aishwarya Kamath
Jan-Martin O. Steitz
Stefan Roth
Ivan Vulić
Iryna Gurevych
26
56
0
13 Sep 2021
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
35
813
0
14 Jun 2021
Designing Multimodal Datasets for NLP Challenges
James Pustejovsky
E. Holderness
Jingxuan Tu
Parker Glenn
Kyeongmin Rim
Kelley Lynch
R. Brutti
13
5
0
12 May 2021
TorchPRISM: Principal Image Sections Mapping, a novel method for Convolutional Neural Network features visualization
Tomasz Szandała
15
1
0
27 Jan 2021
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
J. Clark
Eunsol Choi
Michael Collins
Dan Garrette
Tom Kwiatkowski
Vitaly Nikolaev
J. Palomaki
35
589
0
10 Mar 2020
Robust Explanations for Visual Question Answering
Badri N. Patro
Shivansh Pate
Vinay P. Namboodiri
OOD
AAML
6
20
0
23 Jan 2020
Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing
Vedika Agarwal
Rakshith Shetty
Mario Fritz
CML
AAML
21
155
0
16 Dec 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
A. Schwing
LRM
ReLM
26
9
0
31 Oct 2019
Factor Graph Attention
Idan Schwartz
Seunghak Yu
Tamir Hazan
A. Schwing
19
110
0
11 Apr 2019
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
A. Schwing
Tamir Hazan
19
69
0
11 Apr 2019
Reasoning Visual Dialogs with Structural and Partial Observations
Zilong Zheng
Wenguan Wang
Siyuan Qi
Song-Chun Zhu
30
117
0
11 Apr 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
Xin Eric Wang
Jiawei Wu
Junkun Chen
Lei Li
Yuan-fang Wang
William Yang Wang
15
540
0
06 Apr 2019
Audio-Visual Scene-Aware Dialog
Huda AlAmri
Vincent Cartillier
Abhishek Das
Jue Wang
A. Cherian
...
Tim K. Marks
Chiori Hori
Peter Anderson
Stefan Lee
Devi Parikh
VGen
23
189
0
25 Jan 2019
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Ning Xie
Farley Lai
Derek Doran
Asim Kadav
CoGe
31
321
0
20 Jan 2019
Coarse-to-fine: A RNN-based hierarchical attention model for vehicle re-identification
Xiu-Shen Wei
Chen-Da Liu-Zhang
Lingqiao Liu
Chunhua Shen
Jianxin Wu
6
43
0
11 Dec 2018
Textually Enriched Neural Module Networks for Visual Question Answering
Khyathi Raghavi Chandu
Mary Arpita Pyreddy
Matthieu Felix
N. Joshi
24
6
0
23 Sep 2018
TVQA: Localized, Compositional Video Question Answering
Muhammad Abdul Wahab
Licheng Yu
Mounir Nasr Allah
Tamara L. Berg
23
616
0
05 Sep 2018
Joint Image Captioning and Question Answering
Jialin Wu
Zeyuan Hu
Raymond J. Mooney
22
12
0
22 May 2018
Defoiling Foiled Image Captions
Pranava Madhyastha
Josiah Wang
Lucia Specia
22
9
0
16 May 2018
Interactive Grounded Language Acquisition and Generalization in a 2D World
Haonan Yu
Haichao Zhang
W. Xu
LLMAG
LM&Ro
14
77
0
31 Jan 2018
Game of Sketches: Deep Recurrent Models of Pictionary-style Word Guessing
Ravi Kiran Sarvadevabhatla
Shiv Surya
Trisha Mittal
Venkatesh Babu Radhakrishnan
13
14
0
29 Jan 2018
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions
Qing Li
Jianlong Fu
D. Yu
Tao Mei
Jiebo Luo
FAtt
XAI
CoGe
46
60
0
27 Jan 2018
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Aishwarya Agrawal
Dhruv Batra
Devi Parikh
Aniruddha Kembhavi
OOD
51
581
0
01 Dec 2017
Adversarial Attacks Beyond the Image Space
Xiaohui Zeng
Chenxi Liu
Yu-Siang Wang
Weichao Qiu
Lingxi Xie
Yu-Wing Tai
Chi-Keung Tang
Alan Yuille
AAML
25
145
0
20 Nov 2017
Active Learning for Visual Question Answering: An Empirical Study
Xiaoyu Lin
Devi Parikh
36
31
0
06 Nov 2017
VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
Chuang Gan
Yandong Li
Haoxiang Li
Chen Sun
Boqing Gong
19
126
0
15 Aug 2017
Fluency-Guided Cross-Lingual Image Captioning
Weiyu Lan
Xirong Li
Jianfeng Dong
19
92
0
15 Aug 2017
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
Jiasen Lu
A. Kannan
Jianwei Yang
Devi Parikh
Dhruv Batra
BDL
15
136
0
05 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
13
2,856
0
26 May 2017
The Forgettable-Watcher Model for Video Question Answering
Hongyang Xue
Zhou Zhao
Deng Cai
16
9
0
03 May 2017
1
2
Next