Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.04289
Cited By
Multi-modality Latent Interaction Network for Visual Question Answering
IEEE International Conference on Computer Vision (ICCV), 2019
10 August 2019
Shiyang Feng
Haoxuan You
Zhanpeng Zhang
Xiaogang Wang
Jiaming Song
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Multi-modality Latent Interaction Network for Visual Question Answering"
44 / 44 papers shown
Hadamard product in deep learning: Introduction, Advances and Challenges
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Grigorios G. Chrysos
Yongtao Wu
Razvan Pascanu
Philip Torr
Volkan Cevher
AAML
385
19
0
17 Apr 2025
A Pattern to Align Them All: Integrating Different Modalities to Define Multi-Modal Entities
Gianluca Apriceno
Valentina Tamma
Tania Bailoni
Jacopo de Berardinis
Mauro Dragoni
179
0
0
17 Oct 2024
Listen Then See: Video Alignment with Speaker Attention
Aviral Agrawal
Carlos Mateo Samudio Lezcano
Iqui Balam Heredia-Marin
P. Sethi
204
3
0
21 Apr 2024
Object Attribute Matters in Visual Question Answering
Peize Li
Q. Si
Peng Fu
Zheng Lin
Yan Wang
292
1
0
20 Dec 2023
Visual Question Generation in Bengali
Mahmud Hasan
Labiba Islam
J. Ruma
T. Mayeesha
Rashedur Rahman
271
2
0
12 Oct 2023
VQA with Cascade of Self- and Co-Attention Blocks
Aakansha Mishra
Ashish Anand
Prithwijit Guha
155
1
0
28 Feb 2023
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Feiqi Cao
Siwen Luo
F. Núñez
Zean Wen
Josiah Poon
Caren Han
GNN
449
6
0
16 Dec 2022
Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets
Anurag Roy
David Johnson Ekka
Saptarshi Ghosh
Abir Das
276
1
0
13 Oct 2022
Visuo-Tactile Transformers for Manipulation
Conference on Robot Learning (CoRL), 2022
Yizhou Chen
A. Sipos
Mark Van der Merwe
Nima Fazeli
ViT
755
57
0
30 Sep 2022
Interactive Question Answering Systems: Literature Review
ACM Computing Surveys (ACM CSUR), 2022
Giovanni Maria Biancofiore
Yashar Deldjoo
Tommaso Di Noia
E. Sciascio
Fedelucio Narducci
488
42
0
04 Sep 2022
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks
IEEE Transactions on Image Processing (IEEE TIP), 2022
Gen Luo
Weihao Ye
Xiaoshuai Sun
Yan Wang
Liujuan Cao
Yongjian Wu
Feiyue Huang
Rongrong Ji
ViT
175
59
0
16 Apr 2022
Question-Driven Graph Fusion Network For Visual Question Answering
IEEE International Conference on Multimedia and Expo (ICME), 2022
Yuxi Qian
Yuncong Hu
Ruonan Wang
Fangxiang Feng
Xiaojie Wang
GNN
254
13
0
03 Apr 2022
General Greedy De-bias Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Xinzhe Han
Shuhui Wang
Chi Su
Qingming Huang
Qi Tian
542
18
0
20 Dec 2021
On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering
K. Gouthaman
Anurag Mittal
CML
248
0
0
28 Aug 2021
Fast Convergence of DETR with Spatially Modulated Co-Attention
IEEE International Conference on Computer Vision (ICCV), 2021
Shiyang Feng
Minghang Zheng
Xiaogang Wang
Jifeng Dai
Jiaming Song
ViT
321
381
0
05 Aug 2021
Greedy Gradient Ensemble for Robust Visual Question Answering
IEEE International Conference on Computer Vision (ICCV), 2021
Xinzhe Han
Shuhui Wang
Chi Su
Qingming Huang
Q. Tian
314
92
0
27 Jul 2021
Oriented Object Detection with Transformer
Teli Ma
Mingyuan Mao
Honghui Zheng
Shiyang Feng
Xiaodi Wang
Shumin Han
Errui Ding
Baochang Zhang
David Doermann
ViT
241
60
0
06 Jun 2021
Scalable Transformers for Neural Machine Translation
Shiyang Feng
Shijie Geng
Ping Luo
Xiaogang Wang
Jifeng Dai
Jiaming Song
245
14
0
04 Jun 2021
Container: Context Aggregation Network
Neural Information Processing Systems (NeurIPS), 2021
Peng Gao
Jiasen Lu
Jiaming Song
Roozbeh Mottaghi
Aniruddha Kembhavi
ViT
335
82
0
02 Jun 2021
Dual-stream Network for Visual Recognition
Neural Information Processing Systems (NeurIPS), 2021
Mingyuan Mao
Renrui Zhang
Honghui Zheng
Shiyang Feng
Teli Ma
Yan Peng
Errui Ding
Baochang Zhang
Shumin Han
ViT
351
79
0
31 May 2021
What is Multimodality?
Letitia Parcalabescu
Nils Trost
Anette Frank
301
0
0
10 Mar 2021
Latent Variable Models for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
310
5
0
16 Jan 2021
End-to-End Object Detection with Adaptive Clustering Transformer
British Machine Vision Conference (BMVC), 2020
Minghang Zheng
Shiyang Feng
Renrui Zhang
Kunchang Li
Xiaogang Wang
Jiaming Song
Hao Dong
ViT
445
223
0
18 Nov 2020
Learning to Respond with Your Favorite Stickers: A Framework of Unifying Multi-Modality and User Preference in Multi-Turn Dialog
Shen Gao
Preslav Nakov
Li Liu
Dongyan Zhao
Rui Yan
275
18
0
05 Nov 2020
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
338
6
0
19 Oct 2020
Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering
International Conference on Pattern Recognition (ICPR), 2020
Hantao Huang
Tao Han
Wei Han
D. Yap
Cheng-Ming Chiang
182
4
0
17 Oct 2020
Multi-Pass Transformer for Machine Translation
Shiyang Feng
Chiori Hori
Shijie Geng
Takaaki Hori
Jonathan Le Roux
157
7
0
23 Sep 2020
A Simple Yet Effective Method for Video Temporal Grounding with Cross-Modality Attention
Binjie Zhang
Yu Li
Chun Yuan
D. Xu
Pin Jiang
Ying Shan
95
5
0
23 Sep 2020
Visual Question Answering on Image Sets
European Conference on Computer Vision (ECCV), 2020
Ankan Bansal
Yuting Zhang
Rama Chellappa
CoGe
372
50
0
27 Aug 2020
Linguistically-aware Attention for Reducing the Semantic-Gap in Vision-Language Tasks
K. Gouthaman
Athira M. Nambiar
K. Srinivas
Anurag Mittal
VLM
277
14
0
18 Aug 2020
HAMLET: A Hierarchical Multimodal Attention-based Human Activity Recognition Algorithm
Md. Mofijul Islam
Tariq Iqbal
186
96
0
03 Aug 2020
Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Su
Zhengkai Jiang
Shiyang Feng
Zuohui Fu
Gerard de Melo
Sen Su
VLM
SSL
CLIP
194
29
0
26 Jul 2020
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
European Conference on Computer Vision (ECCV), 2020
K. Gouthaman
Anurag Mittal
408
89
0
13 Jul 2020
Extreme Low-Light Imaging with Multi-granulation Cooperative Networks
Keqi Wang
Shiyang Feng
Guosheng Lin
Qian Guo
Y. Qian
149
4
0
16 May 2020
Character Matters: Video Story Understanding with Character-Aware Relations
Shijie Geng
Ji Zhang
Zuohui Fu
Shiyang Feng
Hang Zhang
Gerard de Melo
250
11
0
09 May 2020
A Novel Attention-based Aggregation Function to Combine Vision and Language
International Conference on Pattern Recognition (ICPR), 2020
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
135
9
0
27 Apr 2020
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Computer Vision and Pattern Recognition (CVPR), 2020
Longteng Guo
Jing Liu
Xinxin Zhu
Peng Yao
Shichen Lu
Hanqing Lu
ViT
348
220
0
19 Mar 2020
Learning to Respond with Stickers: A Framework of Unifying Multi-Modality in Multi-Turn Dialog
The Web Conference (WWW), 2020
Shen Gao
Preslav Nakov
Chang Liu
Li Liu
Dongyan Zhao
Rui Yan
278
41
0
10 Mar 2020
Unshuffling Data for Improved Generalization
IEEE International Conference on Computer Vision (ICCV), 2020
Damien Teney
Ehsan Abbasnejad
Anton Van Den Hengel
OOD
290
83
0
27 Feb 2020
CQ-VQA: Visual Question Answering on Categorized Questions
IEEE International Joint Conference on Neural Network (IJCNN), 2020
Aakansha Mishra
A. Anand
Prithwijit Guha
302
8
0
17 Feb 2020
Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Lei Shi
Shijie Geng
Kai Shuang
Chiori Hori
Songxiang Liu
Shiyang Feng
Sen Su
293
12
0
03 Jan 2020
Fastened CROWN: Tightened Neural Network Robustness Certificates
AAAI Conference on Artificial Intelligence (AAAI), 2019
Zhaoyang Lyu
Ching-Yun Ko
Zhifeng Kong
Ngai Wong
Dahua Lin
Luca Daniel
392
70
0
02 Dec 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
900
2,847
0
20 Aug 2019
Bilinear Graph Networks for Visual Question Answering
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2019
Dalu Guo
Chang Xu
Dacheng Tao
GNN
258
70
0
23 Jul 2019
1
Page 1 of 1