ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.09406
  4. Cited By
Multimodal Machine Learning: A Survey and Taxonomy
v1v2 (latest)

Multimodal Machine Learning: A Survey and Taxonomy

26 May 2017
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
ArXiv (abs)PDFHTML

Papers citing "Multimodal Machine Learning: A Survey and Taxonomy"

50 / 941 papers shown
Multimodal Generalized Category Discovery
Multimodal Generalized Category Discovery
Yuchang Su
Renping Zhou
Siyu Huang
Xingjian Li
Tianyang Wang
Ziyue Wang
Min Xu
270
6
0
18 Sep 2024
PixelBytes: Catching Unified Representation for Multimodal Generation
PixelBytes: Catching Unified Representation for Multimodal Generation
Fabien Furfaro
126
1
0
16 Sep 2024
One missing piece in Vision and Language: A Survey on Comics Understanding
One missing piece in Vision and Language: A Survey on Comics Understanding
Emanuele Vivoli
Andrey Barsky
Mohamed Ali Souibgui
Artemis LLabres
Marco Bertini
Dimosthenis Karatzas
335
7
0
14 Sep 2024
Early Joint Learning of Emotion Information Makes MultiModal Model
  Understand You Better
Early Joint Learning of Emotion Information Makes MultiModal Model Understand You Better
Mengying Ge
Mingyang Li
Dongkai Tang
Pengbo Li
Kuo Liu
Shuhao Deng
Songbai Pu
Liu Liu
Yang Song
Tao Zhang
233
7
0
12 Sep 2024
Recent Trends of Multimodal Affective Computing: A Survey from NLP
  Perspective
Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective
Guimin Hu
Yi Xin
Weimin Lyu
Haojian Huang
Chang Sun
Zehan Zhu
Lin Gui
Ruichu Cai
Erik Cambria
Hasti Seifi
368
15
0
11 Sep 2024
What to align in multimodal contrastive learning?
What to align in multimodal contrastive learning?International Conference on Learning Representations (ICLR), 2024
Benoit Dufumier
J. Castillo-Navarro
D. Tuia
Jean-Philippe Thiran
341
30
0
11 Sep 2024
PixelBytes: Catching Unified Embedding for Multimodal Generation
PixelBytes: Catching Unified Embedding for Multimodal Generation
Fabien Furfaro
99
0
0
03 Sep 2024
Subgroup Analysis via Model-based Rule Forest
Subgroup Analysis via Model-based Rule ForestIEEE International Conference on Information Reuse and Integration (IRI), 2024
I-Ling Cheng
Chan Hsu
Chantung Ku
Pei-Ju Lee
Yihuang Kang
111
0
0
27 Aug 2024
X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation
X-Reflect: Cross-Reflection Prompting for Multimodal Recommendation
Hanjia Lyu
Ryan Rossi
Xiang Chen
Md Mehrab Tanjim
Stefano Petrangeli
Somdeb Sarkhel
Jiebo Luo
136
7
0
27 Aug 2024
Has Multimodal Learning Delivered Universal Intelligence in Healthcare?
  A Comprehensive Survey
Has Multimodal Learning Delivered Universal Intelligence in Healthcare? A Comprehensive SurveyInformation Fusion (Inf. Fusion), 2024
Qika Lin
Yifan Zhu
Xin Mei
Ling Huang
Jingying Ma
Kai He
Zhen Peng
Xiaoshi Zhong
Mengling Feng
288
61
0
23 Aug 2024
MultiMed: Massively Multimodal and Multitask Medical Understanding
MultiMed: Massively Multimodal and Multitask Medical Understanding
Shentong Mo
Paul Pu Liang
LM&MA
245
6
0
22 Aug 2024
Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation
Harnessing Multimodal Large Language Models for Multimodal Sequential RecommendationAAAI Conference on Artificial Intelligence (AAAI), 2024
Yuyang Ye
Zhi Zheng
Yishan Shen
Tianshu Wang
Hengruo Zhang
Peijun Zhu
Runlong Yu
Kai Zhang
Hui Xiong
433
44
0
19 Aug 2024
A Survey on Integrated Sensing, Communication, and Computation
A Survey on Integrated Sensing, Communication, and ComputationIEEE Communications Surveys and Tutorials (COMST), 2024
Dingzhu Wen
Yong Zhou
Xiaoyang Li
Yuanming Shi
Kaibin Huang
Khaled B. Letaief
236
122
0
15 Aug 2024
End-to-end Semantic-centric Video-based Multimodal Affective Computing
End-to-end Semantic-centric Video-based Multimodal Affective Computing
Ronghao Lin
Ying Zeng
Sijie Mai
Haifeng Hu
VGen
290
2
0
14 Aug 2024
EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition
EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition
Ahmed Abdelkawy
Asem A. Ali
Asem Ali
3DPC
335
4
0
10 Aug 2024
A Systematic Review of Intermediate Fusion in Multimodal Deep Learning
  for Biomedical Applications
A Systematic Review of Intermediate Fusion in Multimodal Deep Learning for Biomedical ApplicationsImage and Vision Computing (IVC), 2024
V. Guarrasi
Fatih Aksu
Camillo Maria Caruso
Francesco Di Feola
Aurora Rofena
Filippo Ruffini
Paolo Soda
OffRLMedImAI4CE
188
51
0
02 Aug 2024
HyperMM : Robust Multimodal Learning with Varying-sized Inputs
HyperMM : Robust Multimodal Learning with Varying-sized Inputs
Hava Chaptoukaev
Vincenzo Marcianó
Francesco Galati
Maria A. Zuluaga
199
1
0
30 Jul 2024
Appformer: A Novel Framework for Mobile App Usage Prediction Leveraging
  Progressive Multi-Modal Data Fusion and Feature Extraction
Appformer: A Novel Framework for Mobile App Usage Prediction Leveraging Progressive Multi-Modal Data Fusion and Feature ExtractionExpert systems with applications (ESWA), 2024
Chuike Sun
Junzhou Chen
Yue Zhao
Hao Han
Ruihai Jing
Guang Tan
Di Wu
224
4
0
28 Jul 2024
Automated Ensemble Multimodal Machine Learning for Healthcare
Automated Ensemble Multimodal Machine Learning for Healthcare
F. Imrie
Stefan Denner
Lucas S. Brunschwig
Klaus H. Maier-Hein
M. Schaar
217
11
1
25 Jul 2024
Chameleon: Images Are What You Need For Multimodal Learning Robust To
  Missing Modalities
Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities
Muhammad Irzam Liaqat
Shah Nawaz
Muhammad Zaigham Zaheer
M. S. Saeed
Hassan Sajjad
Tom De Schepper
Karthik Nandakumar
Muhammad Haris Khan
319
1
0
23 Jul 2024
Resource-Efficient Federated Multimodal Learning via Layer-wise and
  Progressive Training
Resource-Efficient Federated Multimodal Learning via Layer-wise and Progressive Training
Ye Lin Tun
Chu Myaet Thwal
Minh N. H. Nguyen
Choong Seon Hong
283
4
0
22 Jul 2024
Benchmark Granularity and Model Robustness for Image-Text Retrieval
Benchmark Granularity and Model Robustness for Image-Text Retrieval
Mariya Hendriksen
Shuo Zhang
R. Reinanda
Mohamed Yahya
Edgar Meij
Maarten de Rijke
305
0
0
21 Jul 2024
Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation
Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation
Yu Zhang
Ruijie Yu
Kaipeng Zeng
Ding Li
Feng Zhu
Yunbo Wang
Yaohui Jin
Yanyan Xu
176
1
0
21 Jul 2024
Recent Advances in Generative AI and Large Language Models: Current
  Status, Challenges, and Perspectives
Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives
D. Hagos
Rick Battle
Danda B. Rawat
LM&MAOffRL
490
84
0
20 Jul 2024
Towards Interpretable Visuo-Tactile Predictive Models for Soft Robot
  Interactions
Towards Interpretable Visuo-Tactile Predictive Models for Soft Robot Interactions
Enrico Donato
T. G. Thuruthel
Egidio Falotico
180
1
0
16 Jul 2024
IoT-LM: Large Multisensory Language Models for the Internet of Things
IoT-LM: Large Multisensory Language Models for the Internet of Things
Shentong Mo
Russ Salakhutdinov
Louis-Philippe Morency
Paul Pu Liang
MLLM
187
20
0
13 Jul 2024
Diagnosing and Re-learning for Balanced Multimodal Learning
Diagnosing and Re-learning for Balanced Multimodal Learning
Yake Wei
Siwei Li
Ruoxuan Feng
Di Hu
215
34
0
12 Jul 2024
Specialized curricula for training vision-language models in retinal image analysis
Specialized curricula for training vision-language models in retinal image analysis
Robbie Holland
Thomas R. P. Taylor
Christopher Holmes
Sophie Riedl
Julia Mai
...
U. Schmidt-Erfurth
Daniel Rueckert
S. Sivaprasad
A. Lotery
Fernando Navarro
VLMLM&MA
100
0
0
11 Jul 2024
TIP: Tabular-Image Pre-training for Multimodal Classification with
  Incomplete Data
TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data
Siyi Du
Shaoming Zheng
Yinsong Wang
Wenjia Bai
D. O’Regan
Chen Qin
LMTD
260
20
0
10 Jul 2024
Multimodal Chain-of-Thought Reasoning via ChatGPT to Protect Children
  from Age-Inappropriate Apps
Multimodal Chain-of-Thought Reasoning via ChatGPT to Protect Children from Age-Inappropriate Apps
Chuanbo Hu
Bin Liu
Minglei Yin
Yilu Zhou
Xin Li
LRM
117
5
0
08 Jul 2024
Completed Feature Disentanglement Learning for Multimodal MRIs Analysis
Completed Feature Disentanglement Learning for Multimodal MRIs Analysis
Tianling Liu
Hongying Liu
Fanhua Shang
Lequan Yu
Tong Han
Liang Wan
381
3
0
06 Jul 2024
Multimodal Classification via Modal-Aware Interactive Enhancement
Multimodal Classification via Modal-Aware Interactive Enhancement
Qing-Yuan Jiang
Zhouyang Chi
Yang Yang
227
3
0
05 Jul 2024
Multi-modal Masked Siamese Network Improves Chest X-Ray Representation
  Learning
Multi-modal Masked Siamese Network Improves Chest X-Ray Representation Learning
Saeed Shurrab
Alejandro Guerra-Manzanares
Farah E. Shamout
242
4
0
05 Jul 2024
Hard-Attention Gates with Gradient Routing for Endoscopic Image
  Computing
Hard-Attention Gates with Gradient Routing for Endoscopic Image Computing
Giorgio Roffo
Carlo Biffi
Pietro Salvagnini
Andrea Cherubini
222
1
0
05 Jul 2024
Optimal thresholds and algorithms for a model of multi-modal learning in high dimensions
Optimal thresholds and algorithms for a model of multi-modal learning in high dimensions
Christian Keup
Lenka Zdeborová
363
3
0
03 Jul 2024
Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection
Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection
Zixing Li
Chao Yan
Zhen Lan
Xiaojia Xiang
Han Zhou
Jun Lai
Dengqing Tang
288
2
0
02 Jul 2024
Multimodal Data Integration for Precision Oncology: Challenges and
  Future Directions
Multimodal Data Integration for Precision Oncology: Challenges and Future Directions
Huajun Zhou
Fengtao Zhou
Chenyu Zhao
Yingxue Xu
Luyang Luo
Hao Chen
311
18
0
28 Jun 2024
A Survey on Mixture of Experts in Large Language Models
A Survey on Mixture of Experts in Large Language Models
Weilin Cai
Juyong Jiang
Fan Wang
Jing Tang
Sunghun Kim
Jiayi Huang
MoE
477
70
0
26 Jun 2024
Enhancing Scientific Figure Captioning Through Cross-modal Learning
Enhancing Scientific Figure Captioning Through Cross-modal Learning
Mateo Alejandro Rojas
Rafael Carranza
193
0
0
24 Jun 2024
DevBench: A multimodal developmental benchmark for language learning
DevBench: A multimodal developmental benchmark for language learningNeural Information Processing Systems (NeurIPS), 2024
A. W. M. Tan
Sunny Yu
Bria Long
Wanjing Anya Ma
Tonya Murray
Rebecca D. Silverman
Jason D. Yeatman
Michael C. Frank
256
11
0
14 Jun 2024
Zoom and Shift are All You Need
Zoom and Shift are All You Need
Jiahao Qin
192
3
0
13 Jun 2024
Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A
  Survey
Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey
Hao Yang
Yanyan Zhao
Yang Wu
Shilong Wang
Tian Zheng
Hongbo Zhang
Zongyang Ma
Wanxiang Che
Bing Qin
351
35
0
12 Jun 2024
Labeling Comic Mischief Content in Online Videos with a Multimodal
  Hierarchical-Cross-Attention Model
Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model
Elaheh Baharlouei
Mahsa Shafaei
Yigeng Zhang
Hugo Jair Escalante
Thamar Solorio
217
1
0
12 Jun 2024
Comparative Analysis of Personalized Voice Activity Detection Systems:
  Assessing Real-World Effectiveness
Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness
Satyam Kumar
Sai Srujana Buddi
U. Sarawgi
Vineet Garg
Shivesh Ranjan
Ognjen
Rudovic
Ahmed Hussen Abdelaziz
Saurabh N. Adya
204
5
0
12 Jun 2024
Gentle-CLIP: Exploring Aligned Semantic In Low-Quality Multimodal Data
  With Soft Alignment
Gentle-CLIP: Exploring Aligned Semantic In Low-Quality Multimodal Data With Soft Alignment
Zijia Song
Z. Zang
Yelin Wang
Guozheng Yang
Jiangbin Zheng
Kaicheng Yu
Wanyu Chen
Stan Z. Li
249
0
0
09 Jun 2024
Bayesian Structural Model Updating with Multimodal Variational
  Autoencoder
Bayesian Structural Model Updating with Multimodal Variational AutoencoderComputer Methods in Applied Mechanics and Engineering (CMAME), 2024
Tatsuya Itoi
Kazuho Amishiki
Sangwon Lee
T. Yaoyama
118
12
0
07 Jun 2024
Contextual fusion enhances robustness to image blurring
Contextual fusion enhances robustness to image blurring
S. Joshi
Aiswarya Akumalla
S. Haney
Maxim Bazhenov
AAML
118
0
0
07 Jun 2024
AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
Trevine Oorloff
Surya Koppisetti
Nicolo Bonettini
Divyaraj Solanki
Ben Colman
Yaser Yacoob
Ali Shahriyari
Gaurav Bharaj
328
64
0
05 Jun 2024
Automatic Fused Multimodal Deep Learning for Plant Identification
Automatic Fused Multimodal Deep Learning for Plant Identification
Alfreds Lapkovskis
Natalia Nefedova
Ali Beikmohammadi
316
1
0
03 Jun 2024
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision
  Transformer
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
Ding Jia
Jianyuan Guo
Kai Han
Han Wu
Chao Zhang
Chang Xu
Xinghao Chen
ViT
511
49
0
03 Jun 2024
Previous
123456...171819
Next