v1v2 (latest)

Multimodal Learning with Transformers: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

13 June 2022

Papers citing "Multimodal Learning with Transformers: A Survey"

50 / 305 papers shown

Fusing Echocardiography Images and Medical Records for Continuous Patient StratificationIEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control (IEEE TUFFC), 2024

236

15 Jan 2024

Transformer for Object Re-Identification: A SurveyInternational Journal of Computer Vision (IJCV), 2024

Bo Du

427

13 Jan 2024

A Temporal-Spectral Fusion Transformer with Subject-Specific Adapter for Enhancing RSVP-BCI DecodingNeural Networks (NN), 2024

208

12 Jan 2024

Complementary Information Mutual Learning for Multimodality Medical Image Segmentation

Chuyun Shen

261

05 Jan 2024

TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight DetectionAAAI Conference on Artificial Intelligence (AAAI), 2024

266

04 Jan 2024

Inter-X: Towards Versatile Human-Human Interaction Analysis

...

312

26 Dec 2023

From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape

418

18 Dec 2023

Can Physician Judgment Enhance Model Trustworthiness? A Case Study on Predicting Pathological Lymph Nodes in Rectal Cancer

190

15 Dec 2023

Non-contact Multimodal Indoor Human Monitoring Systems: A Survey

Constantino Álvarez Casado

Xiaoting Wu

Olli Silvén

D. Jayagopi

Miguel Bordallo López

209

11 Dec 2023

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and PlanningComputer Vision and Pattern Recognition (CVPR), 2023

318

175

30 Nov 2023

Large Model Based Referring Camouflaged Object Detection

269

28 Nov 2023

Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for Vision-Language Tracking

Bo Liu

378

28 Nov 2023

Images Connect Us Together: Navigating a COVID-19 Local Outbreak in China Through Social Media Images

154

18 Nov 2023

Fuse It or Lose It: Deep Fusion for Multimodal Simulation-Based Inference

Marvin Schmitt

Stefan T. Radev

Paul-Christian Bürkner

378

17 Nov 2023

Vision-Language Instruction Tuning: A Review and Analysis

Ying Shan

322

14 Nov 2023

Which One? Leveraging Context Between Objects and Multiple Views for Language GroundingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

206

12 Nov 2023

Conceptual Model Interpreter for Large Language ModelsInternational Conference on Conceptual Modeling (ER), 2023

Felix Härer

165

11 Nov 2023

OmniVec: Learning robust representations with cross modal sharingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Siddharth Srivastava

Gaurav Sharma

SSL

288

07 Nov 2023

Dynamic Multimodal Information Bottleneck for Multimodality ClassificationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Guang Yang

377

02 Nov 2023

MM-VID: Advancing Video Understanding with GPT-4V(ision)

...

Zicheng Liu

234

30 Oct 2023

Generating Context-Aware Natural Answers for Questions in 3D ScenesBritish Machine Vision Conference (BMVC), 2023

Mohammed Munzer Dwedari

Matthias Niessner

Dave Zhenyu Chen

194

30 Oct 2023

CAD -- Contextual Multi-modal Alignment for Dynamic AVQAIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

310

25 Oct 2023

Density of States Prediction of Crystalline Materials via Prompt-guided Multi-Modal TransformerNeural Information Processing Systems (NeurIPS), 2023

297

24 Oct 2023

Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

461

228

16 Oct 2023

Can We Edit Multimodal Large Language Models?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Huajun Chen

Ningyu Zhang

MLLM

611

12 Oct 2023

Robust Multimodal Learning with Missing Modalities via Parameter-Efficient AdaptationIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Md Kaykobad Reza

Ashley Prater-Bennette

M. Salman Asif

310

06 Oct 2023

A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4Natural Language Processing Journal (JNLP), 2023

Katikapalli Subramanyam Kalyan

LM&MA AI4CE LRM AILaw ELM

299

344

04 Oct 2023

Modality-aware Transformer for Financial Time series ForecastingInternational Conference on AI in Finance (ICAF), 2023

136

02 Oct 2023

Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology DatasetsItalian National Conference on Sensors (INS), 2023

276

30 Sep 2023

PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers

Yuxuan Liu

Zecheng Zhang

Hayden Schaeffer

211

28 Sep 2023

RPEFlow: Multimodal Fusion of RGB-PointCloud-Event for Joint Optical Flow and Scene Flow EstimationIEEE International Conference on Computer Vision (ICCV), 2023

Yuchao Dai

258

26 Sep 2023

A Survey on Image-text Multimodal Models

Ruifeng Guo

Jingxuan Wei

Linzhuang Sun

Khai-Nguyen Nguyen

Guiyong Chang

Dawei Liu

Sibo Zhang

Zhengbing Yao

Mingjun Xu

Liping Bu

VLM

328

23 Sep 2023

RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene ParsingIEEE Transactions on Intelligent Vehicles (TIV), 2023

396

19 Sep 2023

VulnSense: Efficient Vulnerability Detection in Ethereum Smart Contracts by Multimodal Learning with Graph Neural Network and Language Model

144

15 Sep 2023

Deep evidential fusion with uncertainty quantification and contextual discounting for multimodal medical image segmentation

231

12 Sep 2023

A Survey on Interpretable Cross-modal Reasoning

400

05 Sep 2023

Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds

359

01 Sep 2023

Multitask Deep Learning for Accurate Risk Stratification and Prediction of Next Steps for Coronary CT Angiography Patients

164

01 Sep 2023

Spoken Language Intelligence of Large Language Models for Language Learning

286

28 Aug 2023

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision TransformersIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Tobias Christian Nauen

Sebastián M. Palacio

Federico Raue

Andreas Dengel

617

18 Aug 2023

CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology PreservationIEEE International Conference on Computer Vision (ICCV), 2023

Xiaodan Liang

139

14 Aug 2023

Foundational Models Defining a New Era in Vision: A Survey and Outlook

Muhammad Awais

Muzammal Naseer

Salman Khan

Rao Muhammad Anwer

Hisham Cholakkal

434

151

25 Jul 2023

Audio-aware Query-enhanced Transformer for Audio-Visual Segmentation

271

25 Jul 2023

Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature AlignmentIEEE International Conference on Computer Vision (ICCV), 2023

284

24 Jul 2023

Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive Framework

Jingxuan Wei

Cheng Tan

Zhangyang Gao

Linzhuang Sun

Siyuan Li

Bihui Yu

R. Guo

Stan Z. Li

LRM

373

24 Jul 2023

Robust Visual Question Answering: Datasets, Methods, and Future ChallengesIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Pinghui Wang

Jun Liu

333

21 Jul 2023

Transformers in Reinforcement Learning: A Survey

Samira Ebrahimi Kahou

OffRL

251

12 Jul 2023

Transformers in Healthcare: A Survey

...

261

30 Jun 2023

MPM: A Unified 2D-3D Human Pose Representation via Masked Pose ModelingChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023

Zhongyu Jiang

165

29 Jun 2023

Towards Open Vocabulary Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Xiangtai Li

...

Jiangning Zhang

410

218

28 Jun 2023