v1v2 (latest)

Multimodal Learning with Transformers: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

13 June 2022

Papers citing "Multimodal Learning with Transformers: A Survey"

50 / 305 papers shown

Generate to Understand for Representation

296

14 Jun 2023

Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training

394

13 Jun 2023

Modality Influence in Multimodal Machine Learning

227

10 Jun 2023

Towards Arabic Multimodal Dataset for Sentiment AnalysisInternational Conference on Intelligent Data Science Technologies and Applications (IDSTA), 2023

10 Jun 2023

Learning to Ground Instructional Articles in Videos through NarrationsIEEE International Conference on Computer Vision (ICCV), 2023

E. Mavroudi

Triantafyllos Afouras

Lorenzo Torresani

DiffM

217

06 Jun 2023

Backchannel Detection and Agreement Estimation from Video with Transformer NetworksIEEE International Joint Conference on Neural Network (IJCNN), 2023

A. Amer

Chirag Bhuvaneshwara

G. Addluri

Mohammed Maqsood Shaik

Vedant Bonde

Philippe Muller

224

02 Jun 2023

Transformer-based Multi-Modal Learning for Multi Label Remote Sensing Image ClassificationIEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2023

David Hoffmann

Kai Norman Clasen

Begüm Demir

109

02 Jun 2023

Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data

Nathan Vaska

Victoria Helus

LRM

103

01 Jun 2023

Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual PromptingIEEE International Joint Conference on Neural Network (IJCNN), 2023

Qiong Wu

123

01 Jun 2023

Large language models improve Alzheimer's disease diagnosis using multi-modality data

164

26 May 2023

GAMUS: A Geometry-aware Multi-modal Semantic Segmentation Benchmark for Remote Sensing Data

Zhitong Xiong

Sining Chen

Yi Wang

Lichao Mou

Xiao Xiang Zhu

147

24 May 2023

PanoContext-Former: Panoramic Total Scene Understanding with a TransformerComputer Vision and Pattern Recognition (CVPR), 2023

248

21 May 2023

Efficient Multimodal Neural Networks for Trigger-less Voice AssistantsInterspeech (Interspeech), 2023

177

20 May 2023

Transavs: End-To-End Audio-Visual Segmentation With TransformerIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Jiangning Zhang

Yabiao Wang

159

12 May 2023

Multimodal Understanding Through Correlation Maximization and Minimization

Yi Shi

Marc Niethammer

190

04 May 2023

Early Classifying Multimodal SequencesInternational Conference on Multimodal Interaction (ICMI), 2023

Alexander Cao

J. Utke

Diego Klabjan

134

02 May 2023

MH-DETR: Video Moment and Highlight Detection with Cross-modal TransformerIEEE International Joint Conference on Neural Network (IJCNN), 2023

Yang Li

253

29 Apr 2023

A Review of ChatGPT Applications in Education, Marketing, Software Engineering, and Healthcare: Benefits, Drawbacks, and Research Directions

Mohammad Fraiwan

Natheer Khasawneh

219

29 Apr 2023

Representation Matters: The Game of Chess Poses a Challenge to Vision TransformersEuropean Conference on Artificial Intelligence (ECAI), 2023

Johannes Czech

Kristian Kersting

ViT

153

28 Apr 2023

Multi-Modal Deep Learning for Credit Rating Prediction Using Text and Numerical Data StreamsApplied Soft Computing (Appl. Soft Comput.), 2023

204

21 Apr 2023

Transformer-Based Visual Segmentation: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Xiangtai Li

370

244

19 Apr 2023

Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival PredictionComputer Vision and Pattern Recognition (CVPR), 2023

Guillaume Jaume

Anurag J. Vaidya

Richard J. Chen

Drew F. K. Williamson

Paul Pu Liang

Faisal Mahmood

412

102

13 Apr 2023

ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous EnvironmentsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Liang Wang

479

140

06 Apr 2023

Multi-Modal Perceiver Language Model for Outcome Prediction in Emergency Department

Sabri Boughorbel

Fethi Jarray

Abdulaziz Yousuf Al-Homaid

Rashid Niaz

Khalid Alyafei

156

03 Apr 2023

Vision-Language Models for Vision Tasks: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

495

1,005

03 Apr 2023

Multimodal Hyperspectral Image Classification via Interconnected Fusion

229

02 Apr 2023

Self-Supervised Multimodal Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Yongshuo Zong

Oisin Mac Aodha

Timothy M. Hospedales

SSL

319

31 Mar 2023

What Can Human Sketches Do for Object Detection?Computer Vision and Pattern Recognition (CVPR), 2023

Pinaki Nath Chowdhury

311

27 Mar 2023

Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge DistillationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

191

25 Mar 2023

Building artificial neural circuits for domain-general cognition: a primer on brain-inspired systems-level architecture

128

21 Mar 2023

A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?

Yu Qiao

...

Lik-Hang Lee

Yang Yang

Heng Tao Shen

In So Kweon

Choong Seon Hong

303

199

21 Mar 2023

Transformers in Speech Processing: A Survey

448

21 Mar 2023

Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review

Asim Waqas

Aakash Tripathi

Ravichandran Ramachandran

Paul Stewart

Ghulam Rasool

AI4CE

477

11 Mar 2023

Quantifying & Modeling Multimodal Interactions: An Information Decomposition FrameworkNeural Information Processing Systems (NeurIPS), 2023

...

Louis-Philippe Morency

408

23 Feb 2023

Large-scale Multi-Modal Pre-trained Models: A Comprehensive SurveyMachine Intelligence Research (MIR), 2023

Yaowei Wang

Yonghong Tian

Wen Gao

AI4CE VLM

460

272

20 Feb 2023

Transformadores: Fundamentos teoricos y Aplicaciones

J. D. L. Torre

292

18 Feb 2023

PrefixMol: Target- and Chemistry-aware Molecule Design via Prefix Embedding

Zhangyang Gao

Yuqi Hu

Cheng Tan

Stan Z. Li

272

14 Feb 2023

Understanding Multimodal Contrastive Learning and Incorporating Unpaired DataInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023

381

13 Feb 2023

On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective

Jingxiao Chen

211

24 Dec 2022

Cognitive Accident Prediction in Driving Scenes: A Multimodality Benchmark

Jianwu Fang

Jianru Xue

382

19 Dec 2022

Integrating Multimodal Data for Joint Generative Modeling of Complex DynamicsInternational Conference on Machine Learning (ICML), 2022

481

15 Dec 2022

A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and MultimodalIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

456

225

12 Dec 2022

Multimodal Learning for Multi-Omics: A Survey

236

29 Nov 2022

An Inclusive Notion of TextAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Ilia Kuznetsov

Iryna Gurevych

161

10 Nov 2022

Vision+X: A Survey on Multimodal Learning in the Light of DataIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

Ye Zhu

Yuehua Wu

Andrii Zadaianchuk

Yan Yan

354

05 Oct 2022

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys (ACM CSUR), 2022

Paul Pu Liang

Amir Zadeh

Louis-Philippe Morency

310

163

07 Sep 2022

Multimodal learning with graphsNature Machine Intelligence (Nat. Mach. Intell.), 2022

578

137

07 Sep 2022

CKD-TransBTS: Clinical Knowledge-Driven Hybrid Transformer with Modality-Correlated Cross-Attention for Brain Tumor SegmentationIEEE Transactions on Medical Imaging (IEEE TMI), 2022

Hao Chen

...

194

126

15 Jul 2022

Transformers in 3D Point Clouds: A Survey

Mingqiang Wei

Jonathan Li

316

16 May 2022

SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and TextComputer Vision and Pattern Recognition (CVPR), 2022

Pinaki Nath Chowdhury

397

25 Apr 2022