Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1903.12314
Cited By
Relation-Aware Graph Attention Network for Visual Question Answering
29 March 2019
Linjie Li
Zhe Gan
Yu Cheng
Jingjing Liu
GNN
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Relation-Aware Graph Attention Network for Visual Question Answering"
37 / 37 papers shown
Title
Enhancing Vision-Language Models with Scene Graphs for Traffic Accident Understanding
Aaron Lohner
Francesco Compagno
Jonathan M Francis
A. Oltramari
55
2
0
10 Jan 2025
Maintenance Required: Updating and Extending Bootstrapped Human Activity Recognition Systems for Smart Homes
S. Hiremath
Thomas Ploetz
21
1
0
20 Jun 2024
SeCG: Semantic-Enhanced 3D Visual Grounding via Cross-modal Graph Attention
Feng Xiao
Hongbin Xu
Qiuxia Wu
Wenxiong Kang
27
2
0
13 Mar 2024
Learning-To-Rank Approach for Identifying Everyday Objects Using a Physical-World Search Engine
Kanta Kaneda
Shunya Nagashima
Ryosuke Korekata
Motonari Kambara
Komei Sugiura
25
6
0
26 Dec 2023
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
Xingrui Wang
Wufei Ma
Zhuowan Li
Adam Kortylewski
Alan L. Yuille
CoGe
19
12
0
27 Oct 2023
Scene Graph Conditioning in Latent Diffusion
Frank Fundel
DiffM
25
0
0
16 Oct 2023
LOIS: Looking Out of Instance Semantics for Visual Question Answering
Siyu Zhang
Ye Chen
Yaoru Sun
Fang Wang
Haibo Shi
Haoran Wang
23
4
0
26 Jul 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
89
11
0
03 Mar 2023
Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning
Xinyue Hu
Lin Gu
Kazuma Kobayashi
Qi A. An
Qingyu Chen
Zhiyong Lu
Chang Su
Tatsuya Harada
Yingying Zhu
GNN
21
9
0
19 Feb 2023
Multi-Task Edge Prediction in Temporally-Dynamic Video Graphs
Osman Ulger
Julian Wiederer
Mohsen Ghafoorian
Vasileios Belagiannis
Pascal Mettes
35
0
0
06 Dec 2022
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning
Zhuowan Li
Xingrui Wang
Elias Stengel-Eskin
Adam Kortylewski
Wufei Ma
Benjamin Van Durme
Max Planck Institute for Informatics
OOD
LRM
19
56
0
01 Dec 2022
Swinv2-Imagen: Hierarchical Vision Transformer Diffusion Models for Text-to-Image Generation
Rui Li
Weihua Li
Yi Yang
Hanyu Wei
Jianhua Jiang
Quan-wei Bai
DiffM
19
11
0
18 Oct 2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering
Hao Li
Jinfa Huang
Peng Jin
Guoli Song
Qi Wu
Jie Chen
27
20
0
21 Sep 2022
GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
Zhi-Qi Cheng
Qianwen Dai
Siyao Li
Teruko Mitamura
Alexander G. Hauptmann
11
34
0
18 Aug 2022
BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid Counterfactual Training for Robust Content-based Image Retrieval
Wenqiao Zhang
Jiannan Guo
Meng Li
Haochen Shi
Shengyu Zhang
Juncheng Li
Siliang Tang
Yueting Zhuang
44
6
0
09 Jul 2022
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
Hao Wang
Bin Guo
Y. Zeng
Yasan Ding
Chen Qiu
Ying Zhang
Li Yao
Zhiwen Yu
25
2
0
02 Jul 2022
Recent, rapid advancement in visual question answering architecture: a review
V. Kodali
Daniel Berleant
27
9
0
02 Mar 2022
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Jianjian Cao
Xiameng Qin
Sanyuan Zhao
Jianbing Shen
23
20
0
14 Dec 2021
Video as Conditional Graph Hierarchy for Multi-Granular Question Answering
Junbin Xiao
Angela Yao
Zhiyuan Liu
Yicong Li
Wei Ji
Tat-Seng Chua
23
111
0
12 Dec 2021
Language bias in Visual Question Answering: A Survey and Taxonomy
Desen Yuan
13
12
0
16 Nov 2021
HR-RCNN: Hierarchical Relational Reasoning for Object Detection
Hao Chen
Abhinav Shrivastava
13
1
0
26 Oct 2021
Multimodal Dialogue Response Generation
Qingfeng Sun
Yujing Wang
Can Xu
Kai Zheng
Yaming Yang
Huang Hu
Fei Xu
Jessica Zhang
Xiubo Geng
Daxin Jiang
15
43
0
16 Oct 2021
Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos
Zongmeng Zhang
Xianjing Han
Xuemeng Song
Yan Yan
Liqiang Nie
33
36
0
12 Oct 2021
Dense Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Gao
Zuohui Fu
Gerard de Melo
Yunpeng Chen
Sen Su
VLM
SSL
52
10
0
24 Sep 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
19
52
0
16 Aug 2021
Instance-Level Relative Saliency Ranking with Graph Reasoning
Nian Liu
Long Li
Wangbo Zhao
Junwei Han
Ling Shao
22
27
0
08 Jul 2021
Learning from Pixel-Level Label Noise: A New Perspective for Semi-Supervised Semantic Segmentation
Rumeng Yi
Yaping Huang
Q. Guan
Mengyang Pu
Runsheng Zhang
NoLa
13
27
0
26 Mar 2021
Structured Co-reference Graph Attention for Video-grounded Dialogue
Junyeong Kim
Sunjae Yoon
Dahyun Kim
Chang-Dong Yoo
18
26
0
24 Mar 2021
Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues
Hung Le
Nancy F. Chen
S. Hoi
26
14
0
01 Mar 2021
Efficient Graph Deep Learning in TensorFlow with tf_geometric
Jun Hu
Shengsheng Qian
Quan Fang
Youze Wang
Quan Zhao
Huaiwen Zhang
Changsheng Xu
GNN
22
53
0
27 Jan 2021
Graph Neural Networks: Taxonomy, Advances and Trends
Yu Zhou
Haixia Zheng
Xin Huang
Shufeng Hao
Dengao Li
Jumin Zhao
AI4TS
23
113
0
16 Dec 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
24
485
0
11 Jun 2020
Dynamic Language Binding in Relational Visual Reasoning
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
NAI
6
19
0
30 Apr 2020
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
18
1,909
0
09 Aug 2019
An Empirical Study on Leveraging Scene Graphs for Visual Question Answering
Cheng Zhang
Wei-Lun Chao
D. Xuan
21
50
0
28 Jul 2019
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
140
560
0
27 Feb 2017
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
144
1,464
0
06 Jun 2016
1