ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.09406
  4. Cited By
Multimodal Machine Learning: A Survey and Taxonomy
v1v2 (latest)

Multimodal Machine Learning: A Survey and Taxonomy

26 May 2017
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
ArXiv (abs)PDFHTML

Papers citing "Multimodal Machine Learning: A Survey and Taxonomy"

50 / 941 papers shown
OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature
OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature
Alisha Srivastava
Emir Korukluoglu
Minh Nhat Le
Duyen Tran
Chau Minh Pham
Marzena Karpinska
Mohit Iyyer
275
2
0
28 May 2025
SemIRNet: A Semantic Irony Recognition Network for Multimodal Sarcasm Detection
SemIRNet: A Semantic Irony Recognition Network for Multimodal Sarcasm Detection
Jingxuan Zhou
Yuehao Wu
Yibo Zhang
Yeyubei Zhang
Yunchong Liu
Bolin Huang
Chunhong Yuan
242
7
0
28 May 2025
Visual Cues Enhance Predictive Turn-Taking for Two-Party Human Interaction
Visual Cues Enhance Predictive Turn-Taking for Two-Party Human InteractionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Sam O'Connor Russell
Naomi Harte
170
2
0
27 May 2025
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts
Jiayi Xin
Sukwon Yun
Jie Peng
Inyoung Choi
Jenna L. Ballard
Tianlong Chen
Qi Long
MoE
144
8
0
25 May 2025
Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content Detection
Co-AttenDWG: Co-Attentive Dimension-Wise Gating and Expert Fusion for Multi-Modal Offensive Content DetectionIEEE Transactions on Artificial Intelligence (IEEE TAI), 2025
Md. Mithun Hossain
Md. Shakil Hossain
Sudipto Chaki
M. F. Mridha
440
0
0
25 May 2025
PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association
PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association
Abdul Hannan
Muhammad Arslan Manzoor
Shah Nawaz
Muhammad Irzam Liaqat
Markus Schedl
Mubashir Noman
CVBM
402
7
0
22 May 2025
Enhancing LLMs for Time Series Forecasting via Structure-Guided Cross-Modal Alignment
Enhancing LLMs for Time Series Forecasting via Structure-Guided Cross-Modal Alignment
Siming Sun
Kai Zhang
Xuejun Jiang
Wenchao Meng
Qinmin Yang
AI4TS
203
0
0
19 May 2025
NeuroGen: Neural Network Parameter Generation via Large Language Models
NeuroGen: Neural Network Parameter Generation via Large Language Models
Jiaqi Wang
Yusen Zhang
Xi Li
413
0
0
18 May 2025
A Survey on Side Information-driven Session-based Recommendation: From a Data-centric Perspective
A Survey on Side Information-driven Session-based Recommendation: From a Data-centric PerspectiveIEEE Transactions on Knowledge and Data Engineering (TKDE), 2025
Xiaokun Zhang
Bo Xu
Chenliang Li
Bowei He
Hongfei Lin
Chen Ma
Fenglong Ma
256
4
0
18 May 2025
Multi-modal contrastive learning adapts to intrinsic dimensions of shared latent variables
Multi-modal contrastive learning adapts to intrinsic dimensions of shared latent variables
Yu Gui
Cong Ma
Zongming Ma
SSL
309
2
0
18 May 2025
ExpertSteer: Intervening in LLMs through Expert Knowledge
ExpertSteer: Intervening in LLMs through Expert Knowledge
Weixuan Wang
Minghao Wu
Barry Haddow
Alexandra Birch
LLMSV
491
1
0
18 May 2025
Understanding the Capabilities of Molecular Graph Neural Networks in Materials Science Through Multimodal Learning and Physical Context Encoding
Understanding the Capabilities of Molecular Graph Neural Networks in Materials Science Through Multimodal Learning and Physical Context Encoding
Can Polat
Hasan Kurban
Erchin Serpedin
Mustafa Kurban
AI4CE
262
1
0
17 May 2025
Predicting Student Dropout Risk With A Dual-Modal Abrupt Behavioral Changes Approach
Predicting Student Dropout Risk With A Dual-Modal Abrupt Behavioral Changes Approach
Jiabei Cheng
Zhen-Qun Yang
Jiannong Cao
Yu Yang
Xinzhe Zheng
141
3
0
16 May 2025
Unified Sparse-Matrix Representations for Diverse Neural Architectures
Unified Sparse-Matrix Representations for Diverse Neural Architectures
Yuzhou Zhu
172
0
0
11 May 2025
Semantic-Space-Intervened Diffusive Alignment for Visual Classification
Semantic-Space-Intervened Diffusive Alignment for Visual ClassificationInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Zixuan Li
Lei Meng
Guoqing Chao
Wei Wu
Xiaoshuo Yan
Yimeng Yang
Zhuang Qi
Xiangxu Meng
DiffM
358
0
0
09 May 2025
Multimodal Emotion Coupling via Speech-to-Facial and Bodily Gestures in Dyadic Interaction
Multimodal Emotion Coupling via Speech-to-Facial and Bodily Gestures in Dyadic Interaction
Von Ralph Dane Marquez Herbuela
Yukie Nagai
CVBM
91
0
0
08 May 2025
Learning Item Representations Directly from Multimodal Features for Effective Recommendation
Learning Item Representations Directly from Multimodal Features for Effective Recommendation
Xin Zhou
Xiaoxiong Zhang
Dusit Niyato
Zhiqi Shen
240
3
0
08 May 2025
The Multimodal Paradox: How Added and Missing Modalities Shape Bias and Performance in Multimodal AI
The Multimodal Paradox: How Added and Missing Modalities Shape Bias and Performance in Multimodal AI
Kishore Sampath
Pratheesh
Ayaazuddin Mohammad
Resmi Ramachandranpillai
141
1
0
05 May 2025
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation LearningIEEE Access (IEEE Access), 2025
Sangyeon Cho
Jangyeong Jeon
Mingi Kim
Junyeong Kim
CLIPVLM
448
1
0
30 Apr 2025
X-Fusion: Introducing New Modality to Frozen Large Language Models
X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo
Thao Nguyen
Xun Huang
Siddharth Srinivasan Iyer
Yijun Li
...
Eli Shechtman
Krishna Kumar Singh
Yong Jae Lee
Bolei Zhou
Yuheng Li
373
8
0
29 Apr 2025
A Survey on Multimodal Music Emotion Recognition
A Survey on Multimodal Music Emotion Recognition
Rashini Liyanarachchi
Aditya Joshi
Erik Meijering
216
2
0
26 Apr 2025
Semantic-Aware Contrastive Fine-Tuning: Boosting Multimodal Malware Classification with Discriminative Embeddings
Semantic-Aware Contrastive Fine-Tuning: Boosting Multimodal Malware Classification with Discriminative Embeddings
Ivan Montoya Sanchez
Shaswata Mitra
Aritran Piplai
Sudip Mittal
305
2
0
25 Apr 2025
CLIP-IT: CLIP-based Pairing for Histology Images Classification
CLIP-IT: CLIP-based Pairing for Histology Images Classification
Banafsheh Karimian
Giulia Avanzato
Soufian Belharbi
Luke McCaffrey
Luke McCaffrey
Mohammadhadi Shateri
Eric Granger
VLM
348
0
0
22 Apr 2025
Learning Joint ID-Textual Representation for ID-Preserving Image Synthesis
Learning Joint ID-Textual Representation for ID-Preserving Image Synthesis
Zichuan Liu
Liming Jiang
Qing Yan
Yumin Jia
Hao Kang
Xin Lu
DiffM
372
1
0
19 Apr 2025
Multi-Modal Data Fusion for Moisture Content Prediction in Apple Drying
Multi-Modal Data Fusion for Moisture Content Prediction in Apple DryingManufacturing Letters (Manuf. Lett.), 2025
Shichen Li
Chenhui Shao
137
6
0
10 Apr 2025
Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging
Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical ImagingInternational Journal of Machine Learning and Cybernetics (IJMLC), 2025
Siyuan Dai
Kai Ye
Guodong Liu
Haoteng Tang
Chen Tang
MedIm
215
4
0
09 Apr 2025
Task-based Loss Functions in Computer Vision: A Comprehensive Review
Task-based Loss Functions in Computer Vision: A Comprehensive Review
Omar Elharrouss
Yasir Mahmood
Yassine Bechqito
Mohamed Adel Serhani
E. Badidi
Jamal Riffi
Hamid Tairi
392
1
0
05 Apr 2025
Interpretable Multimodal Learning for Tumor Protein-Metal Binding: Progress, Challenges, and Perspectives
Interpretable Multimodal Learning for Tumor Protein-Metal Binding: Progress, Challenges, and Perspectives
Xiaokun Liu
Sayedmohammadreza Rastegari
Yijun Huang
Sxe Chang Cheong
Weikang Liu
...
Sina Tabakhi
Xianyuan Liu
Zheqing Zhu
Wei Sang
Haiping Lu
280
1
0
04 Apr 2025
COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking
COST: Contrastive One-Stage Transformer for Vision-Language Small Object TrackingInformation Fusion (Inf. Fusion), 2025
Chunhui Zhang
Li Liu
Jialin Gao
Xin Sun
Hao Wen
Xi Zhou
Shiming Ge
Yucheng Wang
284
4
0
02 Apr 2025
TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication
TransforMerger: Transformer-based Voice-Gesture Fusion for Robust Human-Robot Communication
Petr Vanc
Karla Stepanova
151
0
0
02 Apr 2025
SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering
SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question AnsweringInternational Conference on Intelligent Computing (ICIC), 2025
Bingxin Li
210
0
0
01 Apr 2025
Multimodal Machine Learning for Real Estate Appraisal: A Comprehensive Survey
Multimodal Machine Learning for Real Estate Appraisal: A Comprehensive SurveyPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2025
Chenya Huang
Zhidong Li
Fang Chen
Bin Liang
173
0
0
28 Mar 2025
Towards Fully Automated Decision-Making Systems for Greenhouse Control: Challenges and Opportunities
Towards Fully Automated Decision-Making Systems for Greenhouse Control: Challenges and Opportunities
Yongshuai Liu
Taeyeong Choi
Xin Liu
AI4CE
235
0
0
27 Mar 2025
3D Convolutional Neural Networks for Improved Detection of Intracranial bleeding in CT Imaging
3D Convolutional Neural Networks for Improved Detection of Intracranial bleeding in CT Imaging
Bargava Subramanian
Naveen Kumarasami
Praveen Shastry
Kalyan Sivasailam
Anandakumar D
...
Harsha KG
R. García-Vázquez
Harini T
Afshin Hussain
Kishore Prasath Venkatesh
187
1
0
26 Mar 2025
Membership Inference Attacks on Large-Scale Models: A Survey
Membership Inference Attacks on Large-Scale Models: A Survey
Hengyu Wu
Yang Cao
MIALM
855
7
0
25 Mar 2025
Enhanced Smart Contract Reputability Analysis using Multimodal Data Fusion on Ethereum
Enhanced Smart Contract Reputability Analysis using Multimodal Data Fusion on Ethereum
Cyrus Malik
Josef Bajada
Joshua Ellul
300
1
0
21 Mar 2025
NdLinear: Preserving Multi-Dimensional Structure for Parameter-Efficient Neural Networks
NdLinear: Preserving Multi-Dimensional Structure for Parameter-Efficient Neural Networks
Alex Reneau
Jerry Yao-Chieh Hu
Zhongfang Zhuang
Ting-Chun Liu
Xiang He
Judah Goldfeder
Nadav Timor
Allen Roush
Ravid Shwartz-Ziv
HAI
427
0
0
21 Mar 2025
Unifying EEG and Speech for Emotion Recognition: A Two-Step Joint Learning Framework for Handling Missing EEG Data During Inference
Unifying EEG and Speech for Emotion Recognition: A Two-Step Joint Learning Framework for Handling Missing EEG Data During Inference
Upasana Tiwari
Rupayan Chakraborty
Sunil Kumar Kopparapu
109
0
0
20 Mar 2025
Video-VoT-R1: An efficient video inference model integrating image packing and AoE architecture
Video-VoT-R1: An efficient video inference model integrating image packing and AoE architecture
Cheng Li
Jiexiong Liu
Yixuan Chen
Yanqin Jia
MLLMVLM
255
2
0
20 Mar 2025
EarthScape: A Multimodal Dataset for Surficial Geologic Mapping and Earth Surface Analysis
EarthScape: A Multimodal Dataset for Surficial Geologic Mapping and Earth Surface Analysis
Matthew Massey
Abdullah-Al-Zubaer Imran
226
0
0
19 Mar 2025
Continual Multimodal Contrastive Learning
Continual Multimodal Contrastive Learning
Xiaohao Liu
Xiaobo Xia
See-Kiong Ng
Tat-Seng Chua
CLL
703
8
0
19 Mar 2025
Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu
Siyuan Meng
Yanting Gao
Song Mao
Pinlong Cai
Guohang Yan
Yirong Chen
Zilin Bian
Ding Wang
Botian Shi
364
12
0
17 Mar 2025
A Multimodal Fusion Model Leveraging MLP Mixer and Handcrafted Features-based Deep Learning Networks for Facial Palsy DetectionPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2025
Heng Yim Nicole Oo
Min Hun Lee
Jeong Hoon Lim
CVBM
255
1
0
13 Mar 2025
Aligning Instance-Semantic Sparse Representation towards Unsupervised Object Segmentation and Shape Abstraction with Repeatable PrimitivesIEEE Transactions on Visualization and Computer Graphics (TVCG), 2025
Jiaxin Li
Hongxing Wang
Jiawei Tan
Zhilong Ou
Junsong Yuan
3DPC
193
1
0
10 Mar 2025
Bimodal Connection Attention Fusion for Speech Emotion Recognition
Bimodal Connection Attention Fusion for Speech Emotion Recognition
Jiachen Luo
Huy Phan
Lin Wang
Joshua D. Reiss
375
0
0
08 Mar 2025
STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification
STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal ClassificationComputer Vision and Pattern Recognition (CVPR), 2025
Siyi Du
Xinzhe Luo
D. O’Regan
Chen Qin
394
3
0
08 Mar 2025
A kinetic-based regularization method for data science applications
A kinetic-based regularization method for data science applications
Abhisek Ganguly
Alessandro Gabbana
Vybhav Rao
Sauro Succi
Santosh Ansumali
361
4
0
06 Mar 2025
Rebalanced Multimodal Learning with Data-aware Unimodal Sampling
Qingyuan Jiang
Zhouyang Chi
Xiao Ma
Qirong Mao
Yang Yang
Jinhui Tang
221
1
0
05 Mar 2025
Reliable Multimodal Learning Via Multi-Level Adaptive DeConfusion
Reliable Multimodal Learning Via Multi-Level Adaptive DeConfusion
Tianze Zhang
Shu Shen
Chao Chen
373
0
0
27 Feb 2025
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts
Zhongyang Li
Ziyue Li
Wanrong Zhu
MoE
469
3
0
27 Feb 2025
Previous
123456...171819
Next