Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1611.00471
Cited By
Dual Attention Networks for Multimodal Reasoning and Matching
2 November 2016
Hyeonseob Nam
Jung-Woo Ha
Jeonghee Kim
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dual Attention Networks for Multimodal Reasoning and Matching"
50 / 62 papers shown
Title
Hadamard product in deep learning: Introduction, Advances and Challenges
Grigorios G. Chrysos
Yongtao Wu
Razvan Pascanu
Philip Torr
V. Cevher
AAML
96
0
0
17 Apr 2025
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics
Yuxuan Liu
Jingmin Sun
Xinjie He
Griffin Pinney
Zecheng Zhang
Hayden Schaeffer
AI4CE
35
5
0
15 Sep 2024
Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training
Mengzhao Jia
Zhihan Zhang
W. Yu
Fangkai Jiao
Meng-Long Jiang
VLM
ReLM
LRM
48
7
0
22 Apr 2024
BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering
Khiem Vinh Tran
Kiet Van Nguyen
N. Nguyen
ViT
23
2
0
28 Jul 2023
Understanding Social Media Cross-Modality Discourse in Linguistic Space
Chunpu Xu
Hanzhuo Tan
Jing Li
Piji Li
19
5
0
26 Feb 2023
Oriented Object Detection in Optical Remote Sensing Images using Deep Learning: A Survey
Kunlin Wang
Zi Wang
Zhang Li
Ang Su
Xichao Teng
Minhao Liu
Qifeng Yu
Qifeng Yu
ObjD
83
8
0
21 Feb 2023
Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study
Mariya Hendriksen
Svitlana Vakulenko
E. Kuiper
Maarten de Rijke
21
5
0
12 Jan 2023
Improving Cross-Modal Retrieval with Set of Diverse Embeddings
Dongwon Kim
Nam-Won Kim
Suha Kwak
16
37
0
30 Nov 2022
Cross-modal Semantic Enhanced Interaction for Image-Sentence Retrieval
Xuri Ge
Fuhai Chen
Songpei Xu
Fuxiang Tao
J. Jose
13
26
0
17 Oct 2022
MKANet: A Lightweight Network with Sobel Boundary Loss for Efficient Land-cover Classification of Satellite Remote Sensing Imagery
Zhiqi Zhang
W. Lu
Jinshan Cao
Guangqi Xie
22
15
0
28 Jul 2022
CAMS: An Annotated Corpus for Causal Analysis of Mental Health Issues in Social Media Posts
Muskan Garg
Chandni Saxena
V. Krishnan
R. Joshi
S. Saha
Vijay K. Mago
Bonnie J. Dorr
CML
19
36
0
11 Jul 2022
A cross-corpus study on speech emotion recognition
R. Milner
Md. Asif Jalal
Raymond W. M. Ng
Thomas Hain
13
30
0
05 Jul 2022
From Pixels to Objects: Cubic Visual Attention for Visual Question Answering
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Heng Tao Shen
19
62
0
04 Jun 2022
Structured Two-stream Attention Network for Video Question Answering
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
25
68
0
02 Jun 2022
Co-VQA : Answering by Interactive Sub Question Sequence
Ruonan Wang
Yuxi Qian
Fangxiang Feng
Xiaojie Wang
Huixing Jiang
LRM
21
16
0
02 Apr 2022
Two-stream Hierarchical Similarity Reasoning for Image-text Matching
Ran Chen
Hanli Wang
Lei Wang
Sam Kwong
13
9
0
10 Mar 2022
SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering
Peixi Xiong
Quanzeng You
Pei Yu
Zicheng Liu
Ying Wu
10
5
0
25 Jan 2022
Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval
Zhongping Zhang
Yiwen Gu
Bryan A. Plummer
40
2
0
11 Dec 2021
Quantifying the Suicidal Tendency on Social Media: A Survey
Muskan Garg
14
4
0
04 Oct 2021
Multimodal Integration of Human-Like Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Philippe Muller
Dominike Thomas
Mihai Bâce
Andreas Bulling
33
16
0
27 Sep 2021
DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval
A. Zhu
Zijie Wang
Yifeng Li
Xili Wan
Jing Jin
Tian Wang
Fangqiang Hu
G. Hua
93
162
0
12 Sep 2021
Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval
Xuri Ge
Fuhai Chen
J. Jose
Zhilong Ji
Zhongqin Wu
Xiao-Chang Liu
20
53
0
05 Aug 2021
ICECAP: Information Concentrated Entity-aware Image Captioning
Anwen Hu
Shizhe Chen
Qin Jin
14
20
0
04 Aug 2021
Spatio-Temporal Representation Factorization for Video-based Person Re-Identification
Abhishek Aich
Meng Zheng
Srikrishna Karanam
Terrence Chen
A. Roy-Chowdhury
Ziyan Wu
29
70
0
25 Jul 2021
Focal Self-attention for Local-Global Interactions in Vision Transformers
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
42
428
0
01 Jul 2021
Looking Outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images
L. Ding
Dong Lin
Shaofu Lin
Jing Zhang
Xiaojie Cui
Yuebin Wang
H. Tang
Lorenzo Bruzzone
ViT
21
97
0
29 Jun 2021
Step-Wise Hierarchical Alignment Network for Image-Text Matching
Zhong Ji
Kexin Chen
Haoran Wang
19
93
0
11 Jun 2021
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
Yufei Xu
Qiming Zhang
Jing Zhang
Dacheng Tao
ViT
48
329
0
07 Jun 2021
Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
Jungin Park
Jiyoung Lee
K. Sohn
157
100
0
29 Apr 2021
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval
Gregor Geigle
Jonas Pfeiffer
Nils Reimers
Ivan Vulić
Iryna Gurevych
27
59
0
22 Mar 2021
Adaptive Multi-Teacher Multi-level Knowledge Distillation
Yuang Liu
Wei Zhang
Jun Wang
8
157
0
06 Mar 2021
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
13
42
0
04 Nov 2020
Multifaceted Context Representation using Dual Attention for Ontology Alignment
Vivek Iyer
Arvind Agarwal
Harshit Kumar
16
17
0
16 Oct 2020
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention
Bin Duan
Hao Tang
Wei Wang
Ziliang Zong
Guowei Yang
Yan Yan
25
59
0
14 Aug 2020
Graph Structured Network for Image-Text Matching
Chunxiao Liu
Zhendong Mao
Tianzhu Zhang
Hongtao Xie
Bin Wang
Yongdong Zhang
9
232
0
01 Apr 2020
Adaptive Offline Quintuplet Loss for Image-Text Matching
Tianlang Chen
Jiajun Deng
Jiebo Luo
173
68
0
07 Mar 2020
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
A. Schwing
LRM
ReLM
26
9
0
31 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao (Bernie) Huang
Xiaojun Chang
Alexander G. Hauptmann
17
25
0
30 Sep 2019
LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval
Reuben Tan
Huijuan Xu
Kate Saenko
Bryan A. Plummer
15
67
0
27 Sep 2019
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
28
59
0
26 Sep 2019
Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators
Kuang-Huei Lee
Hamid Palangi
Xi Chen
Houdong Hu
Jianfeng Gao
VLM
16
37
0
22 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao W. Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
23
299
0
12 Sep 2019
Matching Images and Text with Multi-modal Tensor Fusion and Re-ranking
Tan Wang
Xing Xu
Yang Yang
Alan Hanjalic
Heng Tao Shen
Jingkuan Song
17
145
0
12 Aug 2019
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
28
386
0
31 Jul 2019
Joint Visual-Textual Embedding for Multimodal Style Search
Gil Sadeh
L. Fritz
Gabi Shalev
Eduard Oks
17
8
0
15 Jun 2019
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations
Fenglin Liu
Yuanxin Liu
Xuancheng Ren
Xiaodong He
Xu Sun
VLM
26
81
0
15 May 2019
Weakly Supervised Video Moment Retrieval From Text Queries
Niluthpol Chowdhury Mithun
S. Paul
A. Roy-Chowdhury
16
192
0
05 Apr 2019
Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing
Xihui Liu
Zihao W. Wang
Jing Shao
Xiaogang Wang
Hongsheng Li
ObjD
19
180
0
03 Mar 2019
Answer Them All! Toward Universal Visual Question Answering Models
Robik Shrestha
Kushal Kafle
Christopher Kanan
17
82
0
01 Mar 2019
Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog
Zhe Gan
Yu Cheng
Ahmed El Kholy
Linjie Li
Jingjing Liu
Jianfeng Gao
6
104
0
01 Feb 2019
1
2
Next