Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.01923
Cited By
From Pixels to Objects: Cubic Visual Attention for Visual Question Answering
4 June 2022
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Heng Tao Shen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"From Pixels to Objects: Cubic Visual Attention for Visual Question Answering"
17 / 17 papers shown
Title
DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels
Erjian Guo
Zhen Zhao
Zicheng Wang
Tong Chen
Yunyi Liu
Luping Zhou
DiffM
MedIm
53
0
0
24 Mar 2025
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
29
1
0
17 Nov 2024
Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion
Peiyuan Chen
Zecheng Zhang
Yiping Dong
Li Zhou
Han Wang
29
12
0
14 Aug 2024
Object Attribute Matters in Visual Question Answering
Peize Li
Q. Si
Peng Fu
Zheng Lin
Yan Wang
33
0
0
20 Dec 2023
LOIS: Looking Out of Instance Semantics for Visual Question Answering
Siyu Zhang
Ye Chen
Yaoru Sun
Fang Wang
Haibo Shi
Haoran Wang
25
4
0
26 Jul 2023
HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
Jie Guo
Meiting Wang
Yan Zhou
Bin Song
Yuhao Chi
Wei-liang Fan
Jianglong Chang
37
15
0
16 Dec 2022
Structured Two-stream Attention Network for Video Question Answering
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
25
68
0
02 Jun 2022
Fine-Grained Predicates Learning for Scene Graph Generation
Xinyu Lyu
Lianli Gao
Yuyu Guo
Zhou Zhao
Hao Huang
Hengtao Shen
Jingkuan Song
22
36
0
06 Apr 2022
One-shot Scene Graph Generation
Yuyu Guo
Jingkuan Song
Lianli Gao
Heng Tao Shen
25
29
0
22 Feb 2022
GAN-based Reactive Motion Synthesis with Class-aware Discriminators for Human-human Interaction
Qianhui Men
Hubert P. H. Shum
Edmond S. L. Ho
Howard Leung
25
28
0
01 Oct 2021
Improved RAMEN: Towards Domain Generalization for Visual Question Answering
Bhanuka Gamage
Lim Chern Hong
22
1
0
06 Sep 2021
From General to Specific: Informative Scene Graph Generation via Balance Adjustment
Yuyu Guo
Lianli Gao
Xuanhan Wang
Yuxuan Hu
Xing Xu
Xu Lu
Heng Tao Shen
Jingkuan Song
58
84
0
30 Aug 2021
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Wei-Neng Chen
Weiping Wang
Li Liu
M. Lew
VLM
112
31
0
16 Oct 2020
Conditional Text Generation for Harmonious Human-Machine Interaction
Bin Guo
Hao Wang
Yasan Ding
Wei Wu
Shaoyang Hao
Yueqi Sun
Zhiwen Yu
21
4
0
08 Sep 2019
Attention in Natural Language Processing
Andrea Galassi
Marco Lippi
Paolo Torroni
GNN
28
467
0
04 Feb 2019
Attending Category Disentangled Global Context for Image Classification
Keke Tang
Guodong Wei
Runnan Chen
Jie Zhu
Zhaoquan Gu
Wenping Wang
12
0
0
17 Dec 2018
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
152
1,465
0
06 Jun 2016
1