ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.09406
  4. Cited By
Multimodal Machine Learning: A Survey and Taxonomy
v1v2 (latest)

Multimodal Machine Learning: A Survey and Taxonomy

26 May 2017
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
ArXiv (abs)PDFHTML

Papers citing "Multimodal Machine Learning: A Survey and Taxonomy"

50 / 941 papers shown
Can multimodal representation learning by alignment preserve modality-specific information?
Can multimodal representation learning by alignment preserve modality-specific information?
Romain Thoreau
Jessie Levillain
Dawa Derksen
107
0
0
22 Sep 2025
Graph Coloring for Multi-Task Learning
Graph Coloring for Multi-Task Learning
Santosh Patapati
263
0
0
21 Sep 2025
VisMoDAl: Visual Analytics for Evaluating and Improving Corruption Robustness of Vision-Language Models
VisMoDAl: Visual Analytics for Evaluating and Improving Corruption Robustness of Vision-Language Models
Huanchen Wang
Wencheng Zhang
Zhiqiang Wang
Zhicong Lu
Yuxin Ma
131
0
0
18 Sep 2025
Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
Saeed Amizadeh
Sara Abdali
Yinheng Li
K. Koishida
175
0
0
18 Sep 2025
Music4All A+A: A Multimodal Dataset for Music Information Retrieval Tasks
Music4All A+A: A Multimodal Dataset for Music Information Retrieval Tasks
Jonas Geiger
Marta Moscati
Shah Nawaz
Markus Schedl
VLM
96
0
0
18 Sep 2025
A Scenario-Driven Cognitive Approach to Next-Generation AI Memory
A Scenario-Driven Cognitive Approach to Next-Generation AI Memory
Linyue Cai
Yuyang Cheng
Xiaoding Shao
Huiming Wang
Yong Zhao
Wei Zhang
Kang Li
130
3
0
16 Sep 2025
DyKen-Hyena: Dynamic Kernel Generation via Cross-Modal Attention for Multimodal Intent Recognition
DyKen-Hyena: Dynamic Kernel Generation via Cross-Modal Attention for Multimodal Intent Recognition
Yifei Wang
Wenbin Wang
Yong Luo
93
0
0
12 Sep 2025
UOPSL: Unpaired OCT Predilection Sites Learning for Fundus Image Diagnosis Augmentation
UOPSL: Unpaired OCT Predilection Sites Learning for Fundus Image Diagnosis Augmentation
Zhihao Zhao
Yinzheng Zhao
Junjie Yang
Xiangtong Yao
Quanmin Liang
Daniel Zapp
Kai Huang
Nassir Navab
M. A. Nasseri
140
0
0
10 Sep 2025
Exploring Urban Factors with Autoencoders: Relationship Between Static and Dynamic Features
Exploring Urban Factors with Autoencoders: Relationship Between Static and Dynamic Features
Ximena Pocco
Waqar Hassan
Karelia Salinas
Vladimir Molchanov
Luis G. Nonato
87
0
0
07 Sep 2025
Learning to Route: Per-Sample Adaptive Routing for Multimodal Multitask Prediction
Learning to Route: Per-Sample Adaptive Routing for Multimodal Multitask Prediction
Marzieh Ajirak
Oded Bein
Ellen Rose Bowen
Dora Kanellopoulos
Avital Falk
Faith M. Gunning
Nili Solomonov
Logan Grosenick
163
0
0
06 Sep 2025
Artificial intelligence for representing and characterizing quantum systems
Artificial intelligence for representing and characterizing quantum systems
Yuxuan Du
Yan Zhu
Y. Zhang
Min-hsiu Hsieh
Patrick Rebentrost
...
Ya-Dong Wu
Jens Eisert
G. Chiribella
Dacheng Tao
B. Sanders
175
3
0
05 Sep 2025
Balanced Multimodal Learning: An Unidirectional Dynamic Interaction Perspective
Balanced Multimodal Learning: An Unidirectional Dynamic Interaction Perspective
Shijie Wang
Li Zhang
Xinyan Liang
Y. Qian
Shen Hu
230
0
0
02 Sep 2025
A Multimodal Deep Learning Framework for Early Diagnosis of Liver Cancer via Optimized BiLSTM-AM-VMD Architecture
A Multimodal Deep Learning Framework for Early Diagnosis of Liver Cancer via Optimized BiLSTM-AM-VMD Architecture
Cheng Cheng
Zeping Chen
Xavier Wang
198
0
0
01 Sep 2025
MVRS: The Multimodal Virtual Reality Stimuli-based Emotion Recognition Dataset
MVRS: The Multimodal Virtual Reality Stimuli-based Emotion Recognition Dataset
Seyed Muhammad Hossein Mousavi
Atiye Ilanloo
114
0
0
31 Aug 2025
Speech Emotion Recognition via Entropy-Aware Score Selection
Speech Emotion Recognition via Entropy-Aware Score Selection
ChenYi Chua
JunKai Wong
Chengxin Chen
Xiaoxiao Miao
101
0
0
28 Aug 2025
Developing a Multi-Modal Machine Learning Model For Predicting Performance of Automotive Hood Frames
Developing a Multi-Modal Machine Learning Model For Predicting Performance of Automotive Hood Frames
Abhishek Indupally
Satchit Ramnath
AI4CE
52
0
0
28 Aug 2025
AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning
AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning
Shu Shen
Chao Chen
Tong Zhang
233
0
0
27 Aug 2025
Dynamic Embedding of Hierarchical Visual Features for Efficient Vision-Language Fine-Tuning
Dynamic Embedding of Hierarchical Visual Features for Efficient Vision-Language Fine-Tuning
Xinyu Wei
Guoli Yang
Jialu Zhou
Mingyue Yang
Leqian Li
Kedi Zhang
Chunping Qiu
VLM
127
0
0
25 Aug 2025
EGRA:Toward Enhanced Behavior Graphs and Representation Alignment for Multimodal Recommendation
EGRA:Toward Enhanced Behavior Graphs and Representation Alignment for Multimodal Recommendation
Xiaoxiong Zhang
Xin Zhou
Zhiwei Zeng
Yongjie Wang
Dusit Niyato
Zhiqi Shen
176
0
0
22 Aug 2025
Multimodal Quantum Vision Transformer for Enzyme Commission Classification from Biochemical Representations
Multimodal Quantum Vision Transformer for Enzyme Commission Classification from Biochemical Representations
Murat Isik
M. Saggi
Humaira Gowher
Sabre Kais
37
0
0
20 Aug 2025
Multimodal Data Storage and Retrieval for Embodied AI: A Survey
Multimodal Data Storage and Retrieval for Embodied AI: A Survey
Yihao Lu
Hao Tang
140
2
0
19 Aug 2025
GazeProphet: Software-Only Gaze Prediction for VR Foveated Rendering
GazeProphet: Software-Only Gaze Prediction for VR Foveated Rendering
Farhaan Ebadulla
Chiraag Mudlapur
Gaurav BV
130
0
0
19 Aug 2025
SPANER: Shared Prompt Aligner for Multimodal Semantic Representation
SPANER: Shared Prompt Aligner for Multimodal Semantic Representation
Thye Shan Ng
Caren Soyeon Han
Eun-Jung Holden
135
0
0
18 Aug 2025
FedUNet: A Lightweight Additive U-Net Module for Federated Learning with Heterogeneous Models
FedUNet: A Lightweight Additive U-Net Module for Federated Learning with Heterogeneous Models
Beomseok Seo
Kichang Lee
JaeYeon Park
FedML
120
0
0
18 Aug 2025
Arabic Multimodal Machine Learning: Datasets, Applications, Approaches, and Challenges
Arabic Multimodal Machine Learning: Datasets, Applications, Approaches, and Challenges
Abdelhamid Haouhat
Slimane Bellaouar
A. Nehar
H. Cherroun
Ahmed Abdelali
140
1
0
17 Aug 2025
UniCast: A Unified Multimodal Prompting Framework for Time Series Forecasting
UniCast: A Unified Multimodal Prompting Framework for Time Series Forecasting
Sehyuk Park
S. Han
Eduard Hovy
AI4TS
117
0
0
16 Aug 2025
MUJICA: Reforming SISR Models for PBR Material Super-Resolution via Cross-Map Attention
MUJICA: Reforming SISR Models for PBR Material Super-Resolution via Cross-Map Attention
Xin Du
Maoyuan Xu
Zhi Ying
124
0
0
13 Aug 2025
Does Multimodality Improve Recommender Systems as Expected? A Critical Analysis and Future Directions
Does Multimodality Improve Recommender Systems as Expected? A Critical Analysis and Future Directions
Hongyu Zhou
Yinan Zhang
Aixin Sun
Zhiqi Shen
119
1
0
07 Aug 2025
LUST: A Multi-Modal Framework with Hierarchical LLM-based Scoring for Learned Thematic Significance Tracking in Multimedia Content
LUST: A Multi-Modal Framework with Hierarchical LLM-based Scoring for Learned Thematic Significance Tracking in Multimedia Content
Anderson de Lima Luiz
60
0
0
06 Aug 2025
Explainable Deep Neural Network for Multimodal ECG Signals: Intermediate vs Late Fusion
Explainable Deep Neural Network for Multimodal ECG Signals: Intermediate vs Late Fusion
Timothy Oladunni
Ehimen Aneni
172
3
0
06 Aug 2025
CM$^3$: Calibrating Multimodal Recommendation
CM3^33: Calibrating Multimodal Recommendation
Xin Zhou
Yongjie Wang
Zhiqi Shen
116
1
0
02 Aug 2025
Multimodal Late Fusion Model for Problem-Solving Strategy Classification in a Machine Learning Game
Multimodal Late Fusion Model for Problem-Solving Strategy Classification in a Machine Learning GameEuropean Conference on Technology Enhanced Learning (EC-TEL), 2025
Clemens Witt
Thiemo Leonhardt
Nadine Bergner
Mareen Grillenberger
OffRL
57
0
0
30 Jul 2025
Automated Detection of Antarctic Benthic Organisms in High-Resolution In Situ Imagery to Aid Biodiversity Monitoring
Automated Detection of Antarctic Benthic Organisms in High-Resolution In Situ Imagery to Aid Biodiversity Monitoring
Cameron Trotter
Huw Griffiths
Tasnuva Ming Khan
Rowan Whittle
115
0
0
29 Jul 2025
Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges
Large Language Models for Crash Detection in Video: A Survey of Methods, Datasets, and Challenges
Sanjeda Akter
Ibne Farabi Shihab
Anuj Sharma
VLM
300
2
0
02 Jul 2025
Equitable Electronic Health Record Prediction with FAME: Fairness-Aware Multimodal Embedding
Equitable Electronic Health Record Prediction with FAME: Fairness-Aware Multimodal Embedding
Nikkie Hooman
Zhongjie Wu
Eric C. Larson
Mehak Gupta
155
0
0
16 Jun 2025
A Survey on World Models Grounded in Acoustic Physical Information
A Survey on World Models Grounded in Acoustic Physical Information
Xiaoliang Chen
Le Chang
Xin Yu
Yunhe Huang
Xianling Tu
SyDaAI4CE
184
1
0
16 Jun 2025
RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer
RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer
Haotian Ni
Yake Wei
Hang Liu
Gong Chen
Chong Peng
Hao Lin
Di Hu
OffRL
295
1
0
13 Jun 2025
MF2Summ: Multimodal Fusion for Video Summarization with Temporal Alignment
MF2Summ: Multimodal Fusion for Video Summarization with Temporal Alignment
Shuo wang
Jihao Zhang
222
1
0
12 Jun 2025
Optimizing Genetic Algorithms with Multilayer Perceptron Networks for Enhancing TinyFace Recognition
Optimizing Genetic Algorithms with Multilayer Perceptron Networks for Enhancing TinyFace Recognition
Mohammad Subhi Al-Batah
Mowafaq Salem Alzboon
Muhyeeddin Alqaraleh
CVBM
205
0
0
11 Jun 2025
Segment Any Architectural Facades (SAAF):An automatic segmentation model for building facades, walls and windows based on multimodal semantics guidance
Segment Any Architectural Facades (SAAF):An automatic segmentation model for building facades, walls and windows based on multimodal semantics guidance
Peilin Li
Jun Yin
Jing Zhong
Ran Luo
Pengyu Zeng
Miao Zhang
227
0
0
09 Jun 2025
Representation Decomposition for Learning Similarity and Contrastness Across Modalities for Affective Computing
Representation Decomposition for Learning Similarity and Contrastness Across Modalities for Affective Computing
Yuanhe Tian
Pengsen Cheng
Guoqing Jin
Lei Zhang
Yan Song
132
3
0
08 Jun 2025
CAtCh: Cognitive Assessment through Cookie Thief
CAtCh: Cognitive Assessment through Cookie ThiefInternational Conference on Digital Health (ICDH), 2025
Joseph T Colonel
Carolyn Hagler
Guiselle Wismer
Laura Curtis
Jacqueline Becker
Juan Wisnivesky
Alex Federman
Gaurav Pandey
115
0
0
07 Jun 2025
Position Prediction Self-Supervised Learning for Multimodal Satellite Imagery Semantic Segmentation
Position Prediction Self-Supervised Learning for Multimodal Satellite Imagery Semantic Segmentation
John Waithaka
Moise Busogi
SSL
168
0
0
07 Jun 2025
Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques
Towards Efficient Multi-LLM Inference: Characterization and Analysis of LLM Routing and Hierarchical Techniques
Adarsh Prasad Behera
J. Champati
Roberto Morabito
Sasu Tarkoma
J. Gross
200
5
0
06 Jun 2025
Computational Thresholds in Multi-Modal Learning via the Spiked Matrix-Tensor Model
Computational Thresholds in Multi-Modal Learning via the Spiked Matrix-Tensor Model
Hugo Tabanelli
Pierre Mergny
Lenka Zdeborová
Florent Krzakala
172
1
0
03 Jun 2025
MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping
MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping
Xiaojun Shan
Qi Cao
Xing Han
Haofei Yu
Paul Liang
280
1
0
02 Jun 2025
TIME: TabPFN-Integrated Multimodal Engine for Robust Tabular-Image Learning
TIME: TabPFN-Integrated Multimodal Engine for Robust Tabular-Image Learning
Jiaqi Luo
Yuan Yuan
Shixin Xu
LMTDAI4TS
216
1
0
01 Jun 2025
Leveraging CLIP Encoder for Multimodal Emotion Recognition
Leveraging CLIP Encoder for Multimodal Emotion RecognitionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025
Yehun Song
Sunyoung Cho
VLM
176
4
0
01 Jun 2025
AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting
AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting
Yuyuan Liu
Yuanhong Chen
Chong Wang
Junlin Han
Junde Wu
Can Peng
Jingkun Chen
Yu Tian
Gustavo Carneiro
VLM
299
0
0
01 Jun 2025
A Survey of Generative Categories and Techniques in Multimodal Generative Models
A Survey of Generative Categories and Techniques in Multimodal Generative Models
Longzhen Han
Awes Mubarak
Almas Baimagambetov
Nikolaos Polatidis
Thar Baker
LRM
404
0
0
29 May 2025
Previous
12345...171819
Next