
Accommodating Audio Modality in CLIP for Multimodal Processing
AAAI Conference on Artificial Intelligence (AAAI), 2023
Papers citing "Accommodating Audio Modality in CLIP for Multimodal Processing"
12 / 12 papers shown
Title |
|---|
![]() Leveraging CLIP Encoder for Multimodal Emotion RecognitionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025 |
![]() Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation LearningIEEE Access (IEEE Access), 2025 |
![]() Gramian Multimodal Representation Learning and AlignmentInternational Conference on Learning Representations (ICLR), 2024 |
![]() Contrasting with Symile: Simple Model-Agnostic Representation Learning
for Unlimited ModalitiesNeural Information Processing Systems (NeurIPS), 2024 |
![]() CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model
for Multimodal ProcessingIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024 |
![]() Audio Generation with Multiple Conditional Diffusion ModelAAAI Conference on Artificial Intelligence (AAAI), 2023 |
![]() MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and
Video GenerationComputer Vision and Pattern Recognition (CVPR), 2022 |












