ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1708.01471
  4. Cited By
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for
  Visual Question Answering

Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering

4 August 2017
Zhou Yu
Jun-chen Yu
Jianping Fan
Dacheng Tao
ArXivPDFHTML

Papers citing "Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering"

50 / 214 papers shown
Title
AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care
AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care
Md Asaduzzaman Jabin
Hanqi Jiang
Y. Li
Patrick Kaggwa
Eugene Douglass
Juliet N. Sekandi
Tianming Liu
LM&MA
73
0
0
01 May 2025
Domain Generalization for Face Anti-spoofing via Content-aware Composite Prompt Engineering
Domain Generalization for Face Anti-spoofing via Content-aware Composite Prompt Engineering
J. Guo
Ajian Liu
Yunfeng Diao
J. Zhang
Hui Ma
Bo Zhao
Richang Hong
Meng Wang
21
0
0
06 Apr 2025
Generalizable Prompt Learning of CLIP: A Brief Overview
Generalizable Prompt Learning of CLIP: A Brief Overview
Fangming Cui
Yonggang Zhang
Xuan Wang
Xule Wang
Liang Xiao
VPVLM
VLM
132
0
0
03 Mar 2025
Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry
Wenjun Hou
Yi Cheng
Kaishuai Xu
Yan Hu
Wenjie Li
Jiang-Dong Liu
28
0
0
17 Nov 2024
Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering
Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering
Zhilin Zhang
Jie Wang
Zhanghao Qin
Ruiqi Zhu
Xiaoliang Gong
MedIm
39
0
0
28 Oct 2024
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric
  Learning
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning
Haiwen Diao
Ying Zhang
Shang Gao
Jiawen Zhu
Long Chen
Huchuan Lu
29
4
0
20 Oct 2024
CMAL: A Novel Cross-Modal Associative Learning Framework for
  Vision-Language Pre-Training
CMAL: A Novel Cross-Modal Associative Learning Framework for Vision-Language Pre-Training
Zhiyuan Ma
Jianjun Li
Guohui Li
Kaiyan Huang
VLM
54
9
0
16 Oct 2024
M3Hop-CoT: Misogynous Meme Identification with Multimodal Multi-hop
  Chain-of-Thought
M3Hop-CoT: Misogynous Meme Identification with Multimodal Multi-hop Chain-of-Thought
G. Kumari
Kirtan Jain
Asif Ekbal
18
1
0
11 Oct 2024
Detecting Android Malware by Visualizing App Behaviors from Multiple
  Complementary Views
Detecting Android Malware by Visualizing App Behaviors from Multiple Complementary Views
Zhaoyi Meng
Jiale Zhang
Jiaqi Guo
Wansen Wang
Wenchao Huang
Jie Cui
Hong Zhong
Yan Xiong
AAML
13
1
0
08 Oct 2024
BadCM: Invisible Backdoor Attack Against Cross-Modal Learning
BadCM: Invisible Backdoor Attack Against Cross-Modal Learning
Zheng Zhang
Xu Yuan
Lei Zhu
Jingkuan Song
Liqiang Nie
AAML
37
11
0
03 Oct 2024
MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal
  Fusion and Behaviour Expansion
MMBee: Live Streaming Gift-Sending Recommendations via Multi-Modal Fusion and Behaviour Expansion
Jiaxin Deng
Shiyao Wang
Yuchen Wang
Jiansong Qi
Liqin Zhao
Guorui Zhou
Gaofeng Meng
26
3
0
15 Jun 2024
Robust Latent Representation Tuning for Image-text Classification
Robust Latent Representation Tuning for Image-text Classification
Hao Sun
Yu Song
VLM
47
0
0
10 Jun 2024
DUPLEX: Dual GAT for Complex Embedding of Directed Graphs
DUPLEX: Dual GAT for Complex Embedding of Directed Graphs
Zhaoru Ke
Hang Yu
Jianguo Li
Haipeng Zhang
33
4
0
08 Jun 2024
Image Captioning via Dynamic Path Customization
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
27
0
0
01 Jun 2024
PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering
  in Pituitary Surgery
PitVQA: Image-grounded Text Embedding LLM for Visual Question Answering in Pituitary Surgery
Runlong He
Mengya Xu
Adrito Das
Danyal Z. Khan
Sophia Bano
Hani J. Marcus
Danail Stoyanov
Matthew J. Clarkson
Mobarakol Islam
45
6
0
22 May 2024
MedThink: Explaining Medical Visual Question Answering via Multimodal
  Decision-Making Rationale
MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale
Xiaotang Gai
Chenyi Zhou
Jiaxiang Liu
Yang Feng
Jian Wu
Zuo-Qiang Liu
MedIm
36
6
0
18 Apr 2024
Unified Multi-modal Diagnostic Framework with Reconstruction
  Pre-training and Heterogeneity-combat Tuning
Unified Multi-modal Diagnostic Framework with Reconstruction Pre-training and Heterogeneity-combat Tuning
Yupei Zhang
Li Pan
Qiushi Yang
Tan Li
Zhen Chen
26
1
0
09 Apr 2024
Multi-modal Semantic Understanding with Contrastive Cross-modal Feature
  Alignment
Multi-modal Semantic Understanding with Contrastive Cross-modal Feature Alignment
Minghua Zhang
Ke Chang
Yunfang Wu
30
1
0
11 Mar 2024
Free Form Medical Visual Question Answering in Radiology
Free Form Medical Visual Question Answering in Radiology
Abhishek Narayanan
Rushabh Musthyala
Rahul Sankar
A. Nistala
P. Singh
Jacopo Cirrone
8
2
0
23 Jan 2024
Probabilistic Prediction of Longitudinal Trajectory Considering Driving
  Heterogeneity with Interpretability
Probabilistic Prediction of Longitudinal Trajectory Considering Driving Heterogeneity with Interpretability
Shuli Wang
Kun Gao
Lanfang Zhang
Yang Liu
Lei Chen
31
4
0
19 Dec 2023
UniDCP: Unifying Multiple Medical Vision-language Tasks via Dynamic
  Cross-modal Learnable Prompts
UniDCP: Unifying Multiple Medical Vision-language Tasks via Dynamic Cross-modal Learnable Prompts
Chenlu Zhan
Yufei Zhang
Yu Lin
Gaoang Wang
Hongwei Wang
VLM
MedIm
26
5
0
18 Dec 2023
Unleashing the Potential of Large Language Model: Zero-shot VQA for
  Flood Disaster Scenario
Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario
Yimin Sun
Chao Wang
Yan Peng
32
5
0
04 Dec 2023
Dynamic Task and Weight Prioritization Curriculum Learning for
  Multimodal Imagery
Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery
H. F. Alsan
Taner Arsan
12
2
0
29 Oct 2023
$\mathbb{VD}$-$\mathbb{GR}$: Boosting $\mathbb{V}$isual
  $\mathbb{D}$ialog with Cascaded Spatial-Temporal Multi-Modal
  $\mathbb{GR}$aphs
VD\mathbb{VD}VD-GR\mathbb{GR}GR: Boosting V\mathbb{V}Visual D\mathbb{D}Dialog with Cascaded Spatial-Temporal Multi-Modal GR\mathbb{GR}GRaphs
Adnen Abdessaied
Lei Shi
Andreas Bulling
3DH
19
3
0
25 Oct 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
21
5
0
23 Sep 2023
TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal
  Backdoored Models
TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models
Indranil Sur
Karan Sikka
Matthew Walmer
K. Koneripalli
Anirban Roy
Xiaoyu Lin
Ajay Divakaran
Susmit Jha
24
8
0
07 Aug 2023
Emotion4MIDI: a Lyrics-based Emotion-Labeled Symbolic Music Dataset
Emotion4MIDI: a Lyrics-based Emotion-Labeled Symbolic Music Dataset
Serkan Sulun
Pedro Oliveira
Paula Viana
23
0
0
27 Jul 2023
A scoping review on multimodal deep learning in biomedical images and
  texts
A scoping review on multimodal deep learning in biomedical images and texts
Zhaoyi Sun
Mingquan Lin
Qingqing Zhu
Qianqian Xie
Fei-Yue Wang
Zhiyong Lu
Yifan Peng
26
18
0
14 Jul 2023
Learning Prompt-Enhanced Context Features for Weakly-Supervised Video
  Anomaly Detection
Learning Prompt-Enhanced Context Features for Weakly-Supervised Video Anomaly Detection
Yujiang Pu
Xiaoyu Wu
Lulu Yang
Shengjin Wang
19
32
0
26 Jun 2023
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document
  Image Classification
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification
Souhail Bakkali
Zuheng Ming
Mickael Coustaty
Marçal Rusiñol
8
6
0
11 May 2023
Shape-Net: Room Layout Estimation from Panoramic Images Robust to
  Occlusion using Knowledge Distillation with 3D Shapes as Additional Inputs
Shape-Net: Room Layout Estimation from Panoramic Images Robust to Occlusion using Knowledge Distillation with 3D Shapes as Additional Inputs
M. Tabata
Kana Kurata
Junichiro Tamamatsu
3DV
3DPC
19
4
0
25 Apr 2023
SurgicalGPT: End-to-End Language-Vision GPT for Visual Question
  Answering in Surgery
SurgicalGPT: End-to-End Language-Vision GPT for Visual Question Answering in Surgery
Lalithkumar Seenivasan
Mobarakol Islam
Gokul Kannan
Hongliang Ren
13
39
0
19 Apr 2023
Exploring Multimodal Sentiment Analysis via CBAM Attention and
  Double-layer BiLSTM Architecture
Exploring Multimodal Sentiment Analysis via CBAM Attention and Double-layer BiLSTM Architecture
Huiru Wang
Xiuhong Li
Zenyu Ren
Dan Yang
Chunming Ma
17
2
0
26 Mar 2023
Divide and Conquer: Answering Questions with Object Factorization and
  Compositional Reasoning
Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
Shi Chen
Qi Zhao
36
2
0
18 Mar 2023
Polar-VQA: Visual Question Answering on Remote Sensed Ice sheet Imagery
  from Polar Region
Polar-VQA: Visual Question Answering on Remote Sensed Ice sheet Imagery from Polar Region
Argho Sarkar
Maryam Rahnemoonfar
20
1
0
13 Mar 2023
CASP-Net: Rethinking Video Saliency Prediction from an
  Audio-VisualConsistency Perceptual Perspective
CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective
Jun Xiong
Gang Wang
Peng Zhang
Wei Huang
Yufei Zha
Guangtao Zhai
21
14
0
11 Mar 2023
RAMM: Retrieval-augmented Biomedical Visual Question Answering with
  Multi-modal Pre-training
RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training
Zheng Yuan
Qiao Jin
Chuanqi Tan
Zhengyun Zhao
Hongyi Yuan
Fei Huang
Songfang Huang
44
27
0
01 Mar 2023
VQA with Cascade of Self- and Co-Attention Blocks
VQA with Cascade of Self- and Co-Attention Blocks
Aakansha Mishra
Ashish Anand
Prithwijit Guha
33
0
0
28 Feb 2023
Interpretable Medical Image Visual Question Answering via Multi-Modal
  Relationship Graph Learning
Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning
Xinyue Hu
Lin Gu
Kazuma Kobayashi
Qi A. An
Qingyu Chen
Zhiyong Lu
Chang Su
Tatsuya Harada
Yingying Zhu
GNN
21
9
0
19 Feb 2023
Multi-modal Machine Learning in Engineering Design: A Review and Future
  Directions
Multi-modal Machine Learning in Engineering Design: A Review and Future Directions
Binyang Song
Ruilin Zhou
Faez Ahmed
AI4CE
35
40
0
14 Feb 2023
AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance
  Industry
AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance Industry
Azin Asgarian
Rohit Saha
Daniel Jakubovitz
Julia Peyre
24
2
0
15 Jan 2023
Deep Biological Pathway Informed Pathology-Genomic Multimodal Survival
  Prediction
Deep Biological Pathway Informed Pathology-Genomic Multimodal Survival Prediction
Lin Qiu
Aminollah Khormali
Kai Liu
11
9
0
06 Jan 2023
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and
  Challenges
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges
R. Zakari
Jim Wilson Owusu
Hailin Wang
Ke Qin
Zaharaddeen Karami Lawal
Yue-hong Dong
LRM
25
16
0
26 Dec 2022
UnICLAM:Contrastive Representation Learning with Adversarial Masking for
  Unified and Interpretable Medical Vision Question Answering
UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering
Chenlu Zhan
Peng Peng
Hongsen Wang
Tao Chen
Hongwei Wang
MedIm
18
3
0
21 Dec 2022
What's Different between Visual Question Answering for Machine
  "Understanding" Versus for Accessibility?
What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?
Yang Trista Cao
Kyle Seelman
Kyungjun Lee
Hal Daumé
14
5
0
26 Oct 2022
A Dual-Attention Learning Network with Word and Sentence Embedding for
  Medical Visual Question Answering
A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering
Xiaofei Huang
Hongfang Gong
MedIm
58
12
0
01 Oct 2022
IDEA: Interactive DoublE Attentions from Label Embedding for Text
  Classification
IDEA: Interactive DoublE Attentions from Label Embedding for Text Classification
Ziyuan Wang
Hailiang Huang
Songqiao Han
VLM
20
2
0
23 Sep 2022
Align, Reason and Learn: Enhancing Medical Vision-and-Language
  Pre-training with Knowledge
Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge
Zhihong Chen
Guanbin Li
Xiang Wan
119
65
0
15 Sep 2022
Multi-Modal Masked Autoencoders for Medical Vision-and-Language
  Pre-Training
Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training
Zhihong Chen
Yu Du
Jinpeng Hu
Yang Liu
Guanbin Li
Xiang Wan
Tsung-Hui Chang
81
111
0
15 Sep 2022
Recurrent Bilinear Optimization for Binary Neural Networks
Recurrent Bilinear Optimization for Binary Neural Networks
Sheng Xu
Yanjing Li
Tian Wang
Teli Ma
Baochang Zhang
Peng Gao
Yu Qiao
Jinhu Lv
Guodong Guo
MQ
4
14
0
04 Sep 2022
12345
Next