v1v2 (latest)

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

ACM Computing Surveys (ACM CSUR), 2022

7 September 2022

Paul Pu Liang

Amir Zadeh

Louis-Philippe Morency

ArXiv (abs)PDF HTML Github

Papers citing "Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions"

50 / 56 papers shown

DashFusion: Dual-stream Alignment with Hierarchical Bottleneck Fusion for Multimodal Sentiment AnalysisIEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2025

150

05 Dec 2025

Exploring Fusion Strategies for Multimodal Vision-Language Systems

Regan Willis

Jason Bakos

115

26 Nov 2025

Advanced Data Collection Techniques in Cloud Security: A Multi-Modal Deep Learning Autoencoder Approach

Aamiruddin Syed

Mohammed Ilyas Ahmad

26 Nov 2025

Real-Time Inference for Distributed Multimodal Systems under Communication Delay Uncertainty

Victor Croisfelt

João Henrique Inacio de Souza

Shashi Raj Pandey

B. Soret

P. Popovski

219

20 Nov 2025

When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning

168

04 Nov 2025

FairGRPO: Fair Reinforcement Learning for Equitable Clinical Reasoning

261

22 Oct 2025

Improving Speech Emotion Recognition with Mutual Information Regularized Generative Model

218

11 Oct 2025

Human Behavior Atlas: Benchmarking Unified Psychological and Social Behavior Understanding

...

215

06 Oct 2025

Massively Multimodal Foundation Models: A Framework for Capturing Interactions with Specialized Mixture-of-Experts

321

30 Sep 2025

IndiSeek learns information-guided disentangled representations

547

25 Sep 2025

Towards Multimodal Active Learning: Efficient Learning with Limited Paired Data

Jiancheng Zhang

Yinglun Zhu

239

25 Sep 2025

M3ET: Efficient Vision-Language Learning for Robotics based on Multimodal Mamba-Enhanced Transformer

177

22 Sep 2025

Arabic Multimodal Machine Learning: Datasets, Applications, Approaches, and Challenges

244

17 Aug 2025

Multimodal Remote Inference

132

11 Aug 2025

MLLM-based Speech Recognition: When and How is Multimodality Beneficial?

286

25 Jul 2025

IsoNet: Causal Analysis of Multimodal Transformers for Neuromuscular Gesture Classification

167

20 Jun 2025

A Survey on Large Language Models for Mathematical Reasoning

...

377

10 Jun 2025

PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts

...

347

06 Jun 2025

MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping

389

02 Jun 2025

ICYM2I: The illusion of multimodal informativeness under missingness

419

22 May 2025

Understanding the Capabilities of Molecular Graph Neural Networks in Materials Science Through Multimodal Learning and Physical Context Encoding

326

17 May 2025

Improving Coverage in Combined Prediction Sets with Weighted p-values

351

17 May 2025

Robust Understanding of Human-Robot Social Interactions through Multimodal Distillation

Tongfei Bian

Mathieu Chollet

T. Guha

383

06 May 2025

POET: Supporting Prompting Creativity and Personalization with Automated Expansion of Text-to-Image GenerationACM Symposium on User Interface Software and Technology (UIST), 2025

493

18 Apr 2025

Engineering Artificial Intelligence: Framework, Challenges, and Future Direction

524

03 Apr 2025

Translating Multimodal AI into Real-World Inspection: TEMAI Evaluation Framework and Pathways for Implementation

185

31 Mar 2025

Multimodal Machine Learning for Real Estate Appraisal: A Comprehensive SurveyPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 2025

237

28 Mar 2025

Multi-modal Time Series Analysis: A Tutorial and Survey

987

17 Mar 2025

DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning

450

14 Mar 2025

Transforming Traditional Neural Networks into Neuromorphic Quantum-Cognitive Models: A Tutorial with Applications

Milan Maksimovic

Ivan S. Maksymov

359

10 Mar 2025

CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models

411

09 Mar 2025

DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning

494

09 Mar 2025

TabulaTime: A Novel Multimodal Deep Learning Framework for Advancing Acute Coronary Syndrome Prediction through Environmental and Clinical Data Integration

434

24 Feb 2025

MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models

601

23 Feb 2025

Understanding the Emergence of Multimodal Representation Alignment

431

22 Feb 2025

Modality Interactive Mixture-of-Experts for Fake News DetectionThe Web Conference (WWW), 2025

427

21 Jan 2025

MM-Path: Multi-modal, Multi-granularity Path Representation Learning -- Extended VersionKnowledge Discovery and Data Mining (KDD), 2024

748

03 Jan 2025

Designing a Robust Radiology Report Generation System

Sonit Singh

MedIm

310

02 Nov 2024

Progressive Compositionality in Text-to-Image Generative ModelsInternational Conference on Learning Representations (ICLR), 2024

541

22 Oct 2024

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

402

16 Oct 2024

Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations

531

02 Oct 2024

Rethinking the Power of Timestamps for Robust Time Series Forecasting: A Global-Local Fusion PerspectiveNeural Information Processing Systems (NeurIPS), 2024

Qi Qi

Jianxin Liao

228

27 Sep 2024

Fusion in Context: A Multimodal Approach to Affective State Recognition

353

18 Sep 2024

Segment Anything with Multiple Modalities

362

17 Aug 2024

End-to-end Semantic-centric Video-based Multimodal Affective Computing

358

14 Aug 2024

IoT-LM: Large Multisensory Language Models for the Internet of Things

Shentong Mo

Russ Salakhutdinov

Louis-Philippe Morency

Paul Pu Liang

MLLM

222

13 Jul 2024

HEMM: Holistic Evaluation of Multimodal Foundation Models

Paul Pu Liang

Louis-Philippe Morency

438

03 Jul 2024

RiskLabs: Predicting Financial Risk Using Large Language Model based on Multimodal and Multi-Sources Data

234

11 Apr 2024

Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision

...

252

10 Apr 2024

Cohort-Individual Cooperative Learning for Multimodal Cancer Survival AnalysisIEEE Transactions on Medical Imaging (IEEE TMI), 2024

Huajun Zhou

Fengtao Zhou

Hao Chen

243

03 Apr 2024