ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.00775
  4. Cited By
Improved Fusion of Visual and Language Representations by Dense
  Symmetric Co-Attention for Visual Question Answering
v1v2 (latest)

Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

3 April 2018
Duy-Kien Nguyen
Takayuki Okatani
ArXiv (abs)PDFHTML

Papers citing "Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering"

50 / 102 papers shown
Title
Group-Adaptive Adversarial Learning for Robust Fake News Detection Against Malicious Comments
Group-Adaptive Adversarial Learning for Robust Fake News Detection Against Malicious Comments
Zhao Tong
Chunlin Gong
Yimeng Gu
Haichao Shi
Qiang Liu
Shu Wu
Xiao-Yu Zhang
AAML
43
0
0
10 Oct 2025
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Fucai Ke
Joy Hsu
Zhixi Cai
Zixian Ma
Xin Zheng
...
P. D. Haghighi
Gholamreza Haffari
Ranjay Krishna
Jiajun Wu
H. Rezatofighi
ReLMCoGeLRM
248
6
0
24 Aug 2025
Language-based Audio Retrieval with Co-Attention Networks
Language-based Audio Retrieval with Co-Attention Networks
Haoran Sun
Xiping Hu
Qiuyi Chen
Jianjun Chen
Jia Wang
Haiyang Zhang
126
0
0
31 Dec 2024
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
229
7
0
17 Nov 2024
SRC-Net: Bi-Temporal Spatial Relationship Concerned Network for Change
  Detection
SRC-Net: Bi-Temporal Spatial Relationship Concerned Network for Change DetectionIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE JSTARS), 2024
Hongjia Chen
Xin Xu
Fangling Pu
221
12
0
09 Jun 2024
MAST: Video Polyp Segmentation with a Mixture-Attention Siamese
  Transformer
MAST: Video Polyp Segmentation with a Mixture-Attention Siamese Transformer
Geng Chen
Junqing Yang
Xiaozhou Pu
Ge-Peng Ji
Huan Xiong
Yongsheng Pan
Hengfei Cui
Yong-quan Xia
MedImViT
170
2
0
23 Jan 2024
Hierarchical Graph Pattern Understanding for Zero-Shot VOS
Hierarchical Graph Pattern Understanding for Zero-Shot VOS
Gensheng Pei
Fumin Shen
Yazhou Yao
Tao Chen
Xian-Sheng Hua
Jikang Cheng
VOS
168
4
0
15 Dec 2023
NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous
  Driving Datasets using Markup Annotations
NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations
Yuichi Inoue
Yuki Yada
Kotaro Tanahashi
Yu Yamaguchi
152
33
0
11 Dec 2023
From Image to Language: A Critical Analysis of Visual Question Answering
  (VQA) Approaches, Challenges, and Opportunities
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and OpportunitiesInformation Fusion (Inf. Fusion), 2023
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
356
68
0
01 Nov 2023
Spatio-temporal Co-attention Fusion Network for Video Splicing
  Localization
Spatio-temporal Co-attention Fusion Network for Video Splicing Localization
Man Lin
Gang Cao
Zijie Lou
115
3
0
18 Sep 2023
Syntax Tree Constrained Graph Network for Visual Question Answering
Syntax Tree Constrained Graph Network for Visual Question AnsweringInternational Conference on Neural Information Processing (ICONIP), 2023
Xiangrui Su
Tao Gui
Chongyang Shi
Jiachang Liu
Liang Hu
GNNNAI
120
3
0
17 Sep 2023
Joint Adaptive Representations for Image-Language Learning
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
223
0
0
31 May 2023
NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for
  Autonomous Driving Scenario
NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving ScenarioAAAI Conference on Artificial Intelligence (AAAI), 2023
Tianwen Qian
Yue Yu
Linhai Zhuo
Yang Jiao
Yueping Jiang
194
243
0
24 May 2023
When Search Meets Recommendation: Learning Disentangled Search
  Representation for Recommendation
When Search Meets Recommendation: Learning Disentangled Search Representation for RecommendationAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Zihua Si
Zhongxiang Sun
Xiao Zhang
Jun Xu
Xiaoxue Zang
Yang Song
Kun Gai
Jirong Wen
AI4TS
146
36
0
18 May 2023
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document
  Image Classification
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image ClassificationInternational Journal on Document Analysis and Recognition (IJDAR), 2021
Souhail Bakkali
Zuheng Ming
Mickael Coustaty
Marçal Rusiñol
123
9
0
11 May 2023
Modeling Dense Multimodal Interactions Between Biological Pathways and
  Histology for Survival Prediction
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival PredictionComputer Vision and Pattern Recognition (CVPR), 2023
Guillaume Jaume
Anurag J. Vaidya
Richard J. Chen
Drew F. K. Williamson
Paul Pu Liang
Faisal Mahmood
308
97
0
13 Apr 2023
Co-attention Propagation Network for Zero-Shot Video Object Segmentation
Co-attention Propagation Network for Zero-Shot Video Object SegmentationIEEE Transactions on Image Processing (IEEE TIP), 2023
Gensheng Pei
Yazhou Yao
Fumin Shen
Daniel Huang
Xing-Rui Huang
Hengtao Shen
VOS
249
15
0
08 Apr 2023
Bridge Damage Cause Estimation Using Multiple Images Based on Visual
  Question Answering
Bridge Damage Cause Estimation Using Multiple Images Based on Visual Question Answering
T. Yamane
Pang-jo Chun
Jiachen Dang
Takayuki Okatani
105
1
0
18 Feb 2023
Multimodality Representation Learning: A Survey on Evolution,
  Pretraining and Its Applications
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications
Muhammad Arslan Manzoor
S. Albarri
Ziting Xian
Zaiqiao Meng
Preslav Nakov
Shangsong Liang
AI4TS
266
46
0
01 Feb 2023
Adaptively Clustering Neighbor Elements for Image-Text Generation
Adaptively Clustering Neighbor Elements for Image-Text Generation
Zihua Wang
Xu Yang
Hanwang Zhang
Haiyang Xu
Mingshi Yan
Feisi Huang
Yu Zhang
VLM
409
0
0
05 Jan 2023
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and
  Challenges
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges
R. Zakari
Jim Wilson Owusu
Hailin Wang
Ke Qin
Zaharaddeen Karami Lawal
Yue-hong Dong
LRM
145
18
0
26 Dec 2022
SceneGATE: Scene-Graph based co-Attention networks for TExt visual
  question answering
SceneGATE: Scene-Graph based co-Attention networks for TExt visual question answering
Feiqi Cao
Siwen Luo
F. Núñez
Zean Wen
Josiah Poon
Caren Han
GNN
332
5
0
16 Dec 2022
AlignVE: Visual Entailment Recognition Based on Alignment Relations
AlignVE: Visual Entailment Recognition Based on Alignment RelationsIEEE transactions on multimedia (IEEE TMM), 2022
Biwei Cao
Jiuxin Cao
Jie Gui
Jiayun Shen
Bo Liu
Lei He
Yuan Yan Tang
James T. Kwok
117
7
0
16 Nov 2022
A Dual-Attention Learning Network with Word and Sentence Embedding for
  Medical Visual Question Answering
A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question AnsweringIEEE Transactions on Medical Imaging (IEEE TMI), 2022
Xiaofei Huang
Hongfang Gong
MedIm
182
21
0
01 Oct 2022
Localizing Anatomical Landmarks in Ocular Images using Zoom-In Attentive
  Networks
Localizing Anatomical Landmarks in Ocular Images using Zoom-In Attentive Networks
Xiaofeng Lei
Shaohua Li
Xinxing Xu
Huazhu Fu
Yong Liu
...
Mingrui Tan
Yanyu Xu
Jocelyn Hui Lin Goh
Rick Siow Mong Goh
Ching-Yu Cheng
169
1
0
25 Sep 2022
Changer: Feature Interaction is What You Need for Change Detection
Changer: Feature Interaction is What You Need for Change DetectionIEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2022
Sheng Fang
Kaiyu Li
Zhe Li
195
244
0
17 Sep 2022
MMKGR: Multi-hop Multi-modal Knowledge Graph Reasoning
MMKGR: Multi-hop Multi-modal Knowledge Graph ReasoningIEEE International Conference on Data Engineering (ICDE), 2022
Shangfei Zheng
Weiqing Wang
Jianfeng Qu
Hongzhi Yin
Wei Chen
Lei Zhao
LRM
157
32
0
03 Sep 2022
Semantic-aware Modular Capsule Routing for Visual Question Answering
Semantic-aware Modular Capsule Routing for Visual Question AnsweringIEEE Transactions on Image Processing (IEEE TIP), 2022
Yudong Han
Jianhua Yin
Yue Yu
Yin-wei Wei
Liqiang Nie
163
10
0
21 Jul 2022
Do You Know My Emotion? Emotion-Aware Strategy Recognition towards a
  Persuasive Dialogue System
Do You Know My Emotion? Emotion-Aware Strategy Recognition towards a Persuasive Dialogue System
Wei Peng
Yue Hu
Luxi Xing
Yuqiang Xie
Yajing Sun
131
4
0
24 Jun 2022
Structured Two-stream Attention Network for Video Question Answering
Structured Two-stream Attention Network for Video Question AnsweringAAAI Conference on Artificial Intelligence (AAAI), 2019
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
167
70
0
02 Jun 2022
Visual Attention Methods in Deep Learning: An In-Depth Survey
Visual Attention Methods in Deep Learning: An In-Depth SurveyInformation Fusion (Inf. Fusion), 2022
Mohammed Hassanin
Saeed Anwar
Ibrahim Radwan
Fahad Shahbaz Khan
Lin Wang
253
229
0
16 Apr 2022
Co-VQA : Answering by Interactive Sub Question Sequence
Co-VQA : Answering by Interactive Sub Question SequenceFindings (Findings), 2022
Ruonan Wang
Yuxi Qian
Fangxiang Feng
Xiaojie Wang
Huixing Jiang
LRM
144
19
0
02 Apr 2022
Shifting More Attention to Visual Backbone: Query-modulated Refinement
  Networks for End-to-End Visual Grounding
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2022
Jiabo Ye
Junfeng Tian
Ming Yan
Xiaoshan Yang
Xuwu Wang
Ji Zhang
Liang He
Xin Lin
ObjD
187
91
0
29 Mar 2022
Bilaterally Slimmable Transformer for Elastic and Efficient Visual
  Question Answering
Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question AnsweringIEEE transactions on multimedia (IEEE TMM), 2022
Zhou Yu
Zitian Jin
Jun Yu
Mingliang Xu
Hongbo Wang
Jianping Fan
132
5
0
24 Mar 2022
MuKEA: Multimodal Knowledge Extraction and Accumulation for
  Knowledge-based Visual Question Answering
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question AnsweringComputer Vision and Pattern Recognition (CVPR), 2022
Yang Ding
Jing Yu
Bangchang Liu
Yue Hu
Mingxin Cui
Qi Wu
108
76
0
17 Mar 2022
CADRE: A Cascade Deep Reinforcement Learning Framework for Vision-based
  Autonomous Urban Driving
CADRE: A Cascade Deep Reinforcement Learning Framework for Vision-based Autonomous Urban DrivingAAAI Conference on Artificial Intelligence (AAAI), 2022
Yinuo Zhao
Kun Wu
Zhiyuan Xu
Zhengping Che
Qi Lu
Jian Tang
C. Liu
167
35
0
17 Feb 2022
Text is no more Enough! A Benchmark for Profile-based Spoken Language
  Understanding
Text is no more Enough! A Benchmark for Profile-based Spoken Language UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2021
Xiao Xu
Libo Qin
Kaiji Chen
Guoxing Wu
Linlin Li
Wanxiang Che
160
7
0
22 Dec 2021
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in
  Visual Question Answering
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Jianjian Cao
Xiameng Qin
Sanyuan Zhao
Jianbing Shen
156
27
0
14 Dec 2021
Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval
Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval
Zhongping Zhang
Yiwen Gu
Bryan A. Plummer
209
2
0
11 Dec 2021
A Simple Long-Tailed Recognition Baseline via Vision-Language Model
A Simple Long-Tailed Recognition Baseline via Vision-Language Model
Teli Ma
Shijie Geng
Mengmeng Wang
Jing Shao
Jiasen Lu
Jiaming Song
Shiyang Feng
Yu Qiao
VLM
206
61
0
29 Nov 2021
MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants
MMIU: Dataset for Visual Intent Understanding in Multimodal Assistants
Alkesh Patel
Joel Ruben Antony Moniz
R. Nguyen
Nicholas Tzou
Hadas Kotek
Vincent Renkens
VGen
122
1
0
13 Oct 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual
  Question Answering
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question AnsweringConference on Computational Natural Language Learning (CoNLL), 2021
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
145
21
0
27 Sep 2021
Auto-Parsing Network for Image Captioning and Visual Question Answering
Auto-Parsing Network for Image Captioning and Visual Question AnsweringIEEE International Conference on Computer Vision (ICCV), 2021
Xu Yang
Chongyang Gao
Hanwang Zhang
Jianfei Cai
197
41
0
24 Aug 2021
A Better Loss for Visual-Textual Grounding
A Better Loss for Visual-Textual GroundingACM Symposium on Applied Computing (SAC), 2021
Davide Rigoni
Luciano Serafini
A. Sperduti
ObjD
133
3
0
11 Aug 2021
Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated
  Recurrent Memory Network
Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated Recurrent Memory NetworkIEEE Transactions on Emerging Topics in Computational Intelligence (IEEE TETCI), 2021
Bowen Xing
Ivor W. Tsang
196
18
0
05 Aug 2021
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question AnsweringIEEE transactions on multimedia (IEEE Trans. Multimedia), 2021
Jianyu Wang
Bingkun Bao
Changsheng Xu
172
88
0
10 Jul 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Attention, please! A survey of Neural Attention Models in Deep LearningArtificial Intelligence Review (AIR), 2021
Alana de Santana Correia
Esther Luna Colombini
HAI
276
246
0
31 Mar 2021
Co-Grounding Networks with Semantic Attention for Referring Expression
  Comprehension in Videos
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in VideosComputer Vision and Pattern Recognition (CVPR), 2021
Sijie Song
Xudong Lin
Jiaying Liu
Zongming Guo
Shih-Fu Chang
ObjD
94
18
0
23 Mar 2021
Learning to Recognize Actions on Objects in Egocentric Video with
  Attention Dictionaries
Learning to Recognize Actions on Objects in Egocentric Video with Attention DictionariesIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Swathikiran Sudhakaran
Sergio Escalera
Oswald Lanz
EgoV
157
21
0
16 Feb 2021
M2FN: Multi-step Modality Fusion for Advertisement Image Assessment
M2FN: Multi-step Modality Fusion for Advertisement Image AssessmentApplied Soft Computing (Appl Soft Comput), 2021
Kyung-Wha Park
Jung-Woo Ha
Junghoon Lee
Sunyoung Kwon
Kyung-Min Kim
Byoung-Tak Zhang
188
3
0
31 Jan 2021
123
Next