Accommodating Audio Modality in CLIP for Multimodal Processing

Accommodating Audio Modality in CLIP for Multimodal Processing

AAAI Conference on Artificial Intelligence (AAAI), 2023
    VLM

Papers citing "Accommodating Audio Modality in CLIP for Multimodal Processing"

12 / 12 papers shown
Title
Leveraging CLIP Encoder for Multimodal Emotion Recognition
Leveraging CLIP Encoder for Multimodal Emotion RecognitionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025
108
2
0
01 Jun 2025
Gramian Multimodal Representation Learning and Alignment
Gramian Multimodal Representation Learning and AlignmentInternational Conference on Learning Representations (ICLR), 2024
354
21
0
16 Dec 2024
Contrasting with Symile: Simple Model-Agnostic Representation Learning
  for Unlimited Modalities
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited ModalitiesNeural Information Processing Systems (NeurIPS), 2024
A. Saporta
A. Puli
Mark Goldstein
Rajesh Ranganath
171
6
0
01 Nov 2024
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model
  for Multimodal Processing
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal ProcessingIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
146
1
0
22 Jan 2024
Audio Generation with Multiple Conditional Diffusion Model
Audio Generation with Multiple Conditional Diffusion ModelAAAI Conference on Artificial Intelligence (AAAI), 2023
164
25
0
23 Aug 2023

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.