ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1708.01471
  4. Cited By
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for
  Visual Question Answering

Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering

4 August 2017
Zhou Yu
Jun-chen Yu
Jianping Fan
Dacheng Tao
ArXivPDFHTML

Papers citing "Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering"

50 / 214 papers shown
Title
Bidirectional Contrastive Split Learning for Visual Question Answering
Bidirectional Contrastive Split Learning for Visual Question Answering
Yuwei Sun
H. Ochiai
16
2
0
24 Aug 2022
M2HF: Multi-level Multi-modal Hybrid Fusion for Text-Video Retrieval
M2HF: Multi-level Multi-modal Hybrid Fusion for Text-Video Retrieval
Shuo Liu
Weize Quan
Mingyuan Zhou
Sihong Chen
Jian Kang
Zhenlan Zhao
Chen Chen
Dong-Ming Yan
8
0
0
16 Aug 2022
Semantic-aware Modular Capsule Routing for Visual Question Answering
Semantic-aware Modular Capsule Routing for Visual Question Answering
Yudong Han
Jianhua Yin
Jianlong Wu
Yin-wei Wei
Liqiang Nie
25
7
0
21 Jul 2022
Structured Two-stream Attention Network for Video Question Answering
Structured Two-stream Attention Network for Video Question Answering
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
25
68
0
02 Jun 2022
HYCEDIS: HYbrid Confidence Engine for Deep Document Intelligence System
HYCEDIS: HYbrid Confidence Engine for Deep Document Intelligence System
Bao-Sinh Nguyen
Q. Tran
Tuan-Anh Dang Nguyen
D. Nguyen
H. Le
18
0
0
01 Jun 2022
An Efficient Modern Baseline for FloodNet VQA
An Efficient Modern Baseline for FloodNet VQA
Aditya Kane
Sahil Khose
19
4
0
30 May 2022
From Easy to Hard: Learning Language-guided Curriculum for Visual
  Question Answering on Remote Sensing Data
From Easy to Hard: Learning Language-guided Curriculum for Visual Question Answering on Remote Sensing Data
Zhenghang Yuan
Lichao Mou
Q. Wang
Xiao Xiang Zhu
11
60
0
06 May 2022
UTC: A Unified Transformer with Inter-Task Contrastive Learning for
  Visual Dialog
UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog
Cheng Chen
Yudong Zhu
Zhenshan Tan
Qingrong Cheng
Xin Jiang
Qun Liu
X. Gu
23
39
0
01 May 2022
Bilinear value networks
Bilinear value networks
Zhang-Wei Hong
Ge Yang
Pulkit Agrawal
OffRL
16
7
0
28 Apr 2022
Attention in Reasoning: Dataset, Analysis, and Modeling
Attention in Reasoning: Dataset, Analysis, and Modeling
Shi Chen
Ming Jiang
Jinhui Yang
Qi Zhao
LRM
28
3
0
20 Apr 2022
Attention Mechanism based Cognition-level Scene Understanding
Attention Mechanism based Cognition-level Scene Understanding
Xuejiao Tang
Tai Le Quy
LRM
23
0
0
17 Apr 2022
Towards Lightweight Transformer via Group-wise Transformation for
  Vision-and-Language Tasks
Towards Lightweight Transformer via Group-wise Transformation for Vision-and-Language Tasks
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Yan Wang
Liujuan Cao
Yongjian Wu
Feiyue Huang
Rongrong Ji
ViT
4
43
0
16 Apr 2022
Visual Attention Methods in Deep Learning: An In-Depth Survey
Visual Attention Methods in Deep Learning: An In-Depth Survey
Mohammed Hassanin
Saeed Anwar
Ibrahim Radwan
F. Khan
Ajmal Saeed Mian
19
145
0
16 Apr 2022
SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context
  in Visual Question Answering
SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering
Vipul Gupta
Zhuowan Li
Adam Kortylewski
Chenyu Zhang
Yingwei Li
Alan Yuille
30
43
0
05 Apr 2022
Question-Driven Graph Fusion Network For Visual Question Answering
Question-Driven Graph Fusion Network For Visual Question Answering
Yuxi Qian
Yuncong Hu
Ruonan Wang
Fangxiang Feng
Xiaojie Wang
GNN
16
10
0
03 Apr 2022
Co-VQA : Answering by Interactive Sub Question Sequence
Co-VQA : Answering by Interactive Sub Question Sequence
Ruonan Wang
Yuxi Qian
Fangxiang Feng
Xiaojie Wang
Huixing Jiang
LRM
21
16
0
02 Apr 2022
Large-scale Bilingual Language-Image Contrastive Learning
Large-scale Bilingual Language-Image Contrastive Learning
ByungSoo Ko
Geonmo Gu
VLM
19
14
0
28 Mar 2022
Bilaterally Slimmable Transformer for Elastic and Efficient Visual
  Question Answering
Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering
Zhou Yu
Zitian Jin
Jun Yu
Mingliang Xu
Hongbo Wang
Jianping Fan
25
4
0
24 Mar 2022
REX: Reasoning-aware and Grounded Explanation
REX: Reasoning-aware and Grounded Explanation
Shi Chen
Qi Zhao
20
18
0
11 Mar 2022
A Review on Methods and Applications in Multimodal Deep Learning
A Review on Methods and Applications in Multimodal Deep Learning
Summaira Jabeen
Xi Li
Muhammad Shoib Amin
Abdul Jabbar
VLM
HAI
24
88
0
18 Feb 2022
Dual-Key Multimodal Backdoors for Visual Question Answering
Dual-Key Multimodal Backdoors for Visual Question Answering
Matthew Walmer
Karan Sikka
Indranil Sur
Abhinav Shrivastava
Susmit Jha
AAML
11
34
0
14 Dec 2021
Relational Graph Learning for Grounded Video Description Generation
Relational Graph Learning for Grounded Video Description Generation
Wenqiao Zhang
X. Wang
Siliang Tang
Haizhou Shi
Haochen Shi
Jun Xiao
Yueting Zhuang
W. Wang
11
33
0
02 Dec 2021
Ubi-SleepNet: Advanced Multimodal Fusion Techniques for Three-stage
  Sleep Classification Using Ubiquitous Sensing
Ubi-SleepNet: Advanced Multimodal Fusion Techniques for Three-stage Sleep Classification Using Ubiquitous Sensing
B. Zhai
Yu Guan
M. Catt
Thomas Ploetz
9
6
0
19 Nov 2021
Medical Visual Question Answering: A Survey
Medical Visual Question Answering: A Survey
Zhihong Lin
Donghao Zhang
Qingyi Tao
Danli Shi
Gholamreza Haffari
Qi Wu
M. He
Z. Ge
28
112
0
19 Nov 2021
Information Fusion in Attention Networks Using Adaptive and Multi-level
  Factorized Bilinear Pooling for Audio-visual Emotion Recognition
Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition
Hengshun Zhou
Jun Du
Yuanyuan Zhang
Qing Wang
Qing-Feng Liu
Chin-Hui Lee
6
44
0
17 Nov 2021
Achieving Human Parity on Visual Question Answering
Achieving Human Parity on Visual Question Answering
Ming Yan
Haiyang Xu
Chenliang Li
Junfeng Tian
Bin Bi
...
Ji Zhang
Songfang Huang
Fei Huang
Luo Si
Rong Jin
24
12
0
17 Nov 2021
Temporal-attentive Covariance Pooling Networks for Video Recognition
Temporal-attentive Covariance Pooling Networks for Video Recognition
Zilin Gao
Qilong Wang
Bingbing Zhang
Q. Hu
P. Li
13
24
0
27 Oct 2021
DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality
  Learning
DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning
Yizhi Wang
Z. Lian
3DV
22
20
0
13 Oct 2021
Echo-Reconstruction: Audio-Augmented 3D Scene Reconstruction
Echo-Reconstruction: Audio-Augmented 3D Scene Reconstruction
Justin Wilson
Nicholas Rewkowski
Ming Lin
Henry Fuchs
19
1
0
05 Oct 2021
3D-MOV: Audio-Visual LSTM Autoencoder for 3D Reconstruction of Multiple
  Objects from Video
3D-MOV: Audio-Visual LSTM Autoencoder for 3D Reconstruction of Multiple Objects from Video
Justin Wilson
Ming-Chia Lin
11
1
0
05 Oct 2021
Geometry-Entangled Visual Semantic Transformer for Image Captioning
Geometry-Entangled Visual Semantic Transformer for Image Captioning
Ling Cheng
Wei Wei
Feida Zhu
Yong-jin Liu
C. Miao
ViT
16
3
0
29 Sep 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual
  Question Answering
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
29
19
0
27 Sep 2021
How to find a good image-text embedding for remote sensing visual
  question answering?
How to find a good image-text embedding for remote sensing visual question answering?
Christel Chappuis
Sylvain Lobry
B. Kellenberger
Bertrand Le Saux
D. Tuia
32
20
0
24 Sep 2021
Towards Developing a Multilingual and Code-Mixed Visual Question
  Answering System by Knowledge Distillation
Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation
H. Khan
D. Gupta
Asif Ekbal
17
14
0
10 Sep 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and
  Intra-modal Knowledge Integration
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
19
53
0
16 Aug 2021
Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question
  Answering
Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering
Rajat Koner
Hang Li
Marcel Hildebrandt
Deepan Das
Volker Tresp
Stephan Günnemann
41
31
0
13 Jul 2021
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
Jianyu Wang
Bingkun Bao
Changsheng Xu
15
75
0
10 Jul 2021
MuVAM: A Multi-View Attention-based Model for Medical Visual Question
  Answering
MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering
Haiwei Pan
Shuning He
Kejia Zhang
Bo Qu
Chunling Chen
Kun Shi
12
11
0
07 Jul 2021
Cogradient Descent for Dependable Learning
Cogradient Descent for Dependable Learning
Runqi Wang
Baochang Zhang
Lian Zhuo
QiXiang Ye
David Doermann
16
0
0
20 Jun 2021
VQA-Aid: Visual Question Answering for Post-Disaster Damage Assessment
  and Analysis
VQA-Aid: Visual Question Answering for Post-Disaster Damage Assessment and Analysis
Argho Sarkar
Maryam Rahnemoonfar
15
19
0
19 Jun 2021
Coarse to Fine Two-Stage Approach to Robust Tensor Completion of Visual
  Data
Coarse to Fine Two-Stage Approach to Robust Tensor Completion of Visual Data
Yicong He
George K. Atia
11
4
0
19 Jun 2021
LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution
  Homography Estimation
LocalTrans: A Multiscale Local Transformer Network for Cross-Resolution Homography Estimation
Ruizhi Shao
Gaochang Wu
Yuemei Zhou
Ying Fu
Yebin Liu
ViT
16
42
0
08 Jun 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
10
55
0
24 May 2021
Graph Inference Representation: Learning Graph Positional Embeddings
  with Anchor Path Encoding
Graph Inference Representation: Learning Graph Positional Embeddings with Anchor Path Encoding
Yuheng Lu
Jinpeng Chen
Chuxiong Sun
Jie Hu
GNN
14
2
0
09 May 2021
AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss
AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Feng Ji
Ji Zhang
A. Bimbo
17
34
0
05 May 2021
A survey on VQA_Datasets and Approaches
A survey on VQA_Datasets and Approaches
Yeyun Zou
Qiyu Xie
40
18
0
02 May 2021
Augmenting Deep Classifiers with Polynomial Neural Networks
Augmenting Deep Classifiers with Polynomial Neural Networks
Grigorios G. Chrysos
Markos Georgopoulos
Jiankang Deng
Jean Kossaifi
Yannis Panagakis
Anima Anandkumar
17
18
0
16 Apr 2021
RTIC: Residual Learning for Text and Image Composition using Graph
  Convolutional Network
RTIC: Residual Learning for Text and Image Composition using Graph Convolutional Network
Minchul Shin
Yoonjae Cho
ByungSoo Ko
Geonmo Gu
8
44
0
07 Apr 2021
Attention, please! A survey of Neural Attention Models in Deep Learning
Attention, please! A survey of Neural Attention Models in Deep Learning
Alana de Santana Correia
Esther Luna Colombini
HAI
21
175
0
31 Mar 2021
Variational Structured Attention Networks for Deep Visual Representation
  Learning
Variational Structured Attention Networks for Deep Visual Representation Learning
Guanglei Yang
Paolo Rota
Xavier Alameda-Pineda
Dan Xu
M. Ding
Elisa Ricci
3DPC
28
3
0
05 Mar 2021
Previous
12345
Next