ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.01923
  4. Cited By
From Pixels to Objects: Cubic Visual Attention for Visual Question
  Answering

From Pixels to Objects: Cubic Visual Attention for Visual Question Answering

4 June 2022
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Heng Tao Shen
ArXivPDFHTML

Papers citing "From Pixels to Objects: Cubic Visual Attention for Visual Question Answering"

17 / 17 papers shown
Title
DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels
DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels
Erjian Guo
Zhen Zhao
Zicheng Wang
Tong Chen
Yunyi Liu
Luping Zhou
DiffM
MedIm
53
0
0
24 Mar 2025
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
29
1
0
17 Nov 2024
Enhancing Visual Question Answering through Ranking-Based Hybrid
  Training and Multimodal Fusion
Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion
Peiyuan Chen
Zecheng Zhang
Yiping Dong
Li Zhou
Han Wang
27
12
0
14 Aug 2024
Object Attribute Matters in Visual Question Answering
Object Attribute Matters in Visual Question Answering
Peize Li
Q. Si
Peng Fu
Zheng Lin
Yan Wang
33
0
0
20 Dec 2023
LOIS: Looking Out of Instance Semantics for Visual Question Answering
LOIS: Looking Out of Instance Semantics for Visual Question Answering
Siyu Zhang
Ye Chen
Yaoru Sun
Fang Wang
Haibo Shi
Haoran Wang
23
4
0
26 Jul 2023
HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
HGAN: Hierarchical Graph Alignment Network for Image-Text Retrieval
Jie Guo
Meiting Wang
Yan Zhou
Bin Song
Yuhao Chi
Wei-liang Fan
Jianglong Chang
37
15
0
16 Dec 2022
Structured Two-stream Attention Network for Video Question Answering
Structured Two-stream Attention Network for Video Question Answering
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
25
68
0
02 Jun 2022
Fine-Grained Predicates Learning for Scene Graph Generation
Fine-Grained Predicates Learning for Scene Graph Generation
Xinyu Lyu
Lianli Gao
Yuyu Guo
Zhou Zhao
Hao Huang
Hengtao Shen
Jingkuan Song
22
36
0
06 Apr 2022
One-shot Scene Graph Generation
One-shot Scene Graph Generation
Yuyu Guo
Jingkuan Song
Lianli Gao
Heng Tao Shen
25
29
0
22 Feb 2022
GAN-based Reactive Motion Synthesis with Class-aware Discriminators for
  Human-human Interaction
GAN-based Reactive Motion Synthesis with Class-aware Discriminators for Human-human Interaction
Qianhui Men
Hubert P. H. Shum
Edmond S. L. Ho
Howard Leung
25
28
0
01 Oct 2021
Improved RAMEN: Towards Domain Generalization for Visual Question
  Answering
Improved RAMEN: Towards Domain Generalization for Visual Question Answering
Bhanuka Gamage
Lim Chern Hong
22
1
0
06 Sep 2021
From General to Specific: Informative Scene Graph Generation via Balance
  Adjustment
From General to Specific: Informative Scene Graph Generation via Balance Adjustment
Yuyu Guo
Lianli Gao
Xuanhan Wang
Yuxuan Hu
Xing Xu
Xu Lu
Heng Tao Shen
Jingkuan Song
58
84
0
30 Aug 2021
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Wei-Neng Chen
Weiping Wang
Li Liu
M. Lew
VLM
110
31
0
16 Oct 2020
Conditional Text Generation for Harmonious Human-Machine Interaction
Conditional Text Generation for Harmonious Human-Machine Interaction
Bin Guo
Hao Wang
Yasan Ding
Wei Wu
Shaoyang Hao
Yueqi Sun
Zhiwen Yu
21
4
0
08 Sep 2019
Attention in Natural Language Processing
Attention in Natural Language Processing
Andrea Galassi
Marco Lippi
Paolo Torroni
GNN
25
467
0
04 Feb 2019
Attending Category Disentangled Global Context for Image Classification
Keke Tang
Guodong Wei
Runnan Chen
Jie Zhu
Zhaoquan Gu
Wenping Wang
12
0
0
17 Dec 2018
Multimodal Compact Bilinear Pooling for Visual Question Answering and
  Visual Grounding
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
149
1,465
0
06 Jun 2016
1