ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.06591
  4. Cited By
Accommodating Audio Modality in CLIP for Multimodal Processing

Accommodating Audio Modality in CLIP for Multimodal Processing

AAAI Conference on Artificial Intelligence (AAAI), 2023
12 March 2023
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
    VLM
ArXiv (abs)PDFHTML

Papers citing "Accommodating Audio Modality in CLIP for Multimodal Processing"

12 / 12 papers shown
Title
FoleyGRAM: Video-to-Audio Generation with GRAM-Aligned Multimodal Encoders
FoleyGRAM: Video-to-Audio Generation with GRAM-Aligned Multimodal Encoders
R. F. Gramaccioni
Christian Marinoni
Eleonora Grassucci
Giordano Cicchetti
A. Uncini
Danilo Comminiello
VGen
46
2
0
07 Oct 2025
Principled Multimodal Representation Learning
Principled Multimodal Representation Learning
Xiaohao Liu
Xiaobo Xia
See-Kiong Ng
Tat-Seng Chua
91
2
0
23 Jul 2025
Leveraging CLIP Encoder for Multimodal Emotion Recognition
Leveraging CLIP Encoder for Multimodal Emotion RecognitionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025
Yehun Song
Sunyoung Cho
VLM
108
2
0
01 Jun 2025
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation LearningIEEE Access (IEEE Access), 2025
Sangyeon Cho
Jangyeong Jeon
Mingi Kim
Junyeong Kim
CLIPVLM
350
0
0
30 Apr 2025
Gramian Multimodal Representation Learning and Alignment
Gramian Multimodal Representation Learning and AlignmentInternational Conference on Learning Representations (ICLR), 2024
Giordano Cicchetti
Eleonora Grassucci
Luigi Sigillo
Danilo Comminiello
354
21
0
16 Dec 2024
STEVE-Audio: Expanding the Goal Conditioning Modalities of Embodied
  Agents in Minecraft
STEVE-Audio: Expanding the Goal Conditioning Modalities of Embodied Agents in Minecraft
Nicholas Lenzen
Amogh Raut
Andrew Melnik
VGen
199
0
0
01 Dec 2024
Contrasting with Symile: Simple Model-Agnostic Representation Learning
  for Unlimited Modalities
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited ModalitiesNeural Information Processing Systems (NeurIPS), 2024
A. Saporta
A. Puli
Mark Goldstein
Rajesh Ranganath
SSL
171
6
0
01 Nov 2024
Refining Knowledge Transfer on Audio-Image Temporal Agreement for
  Audio-Text Cross Retrieval
Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval
Shunsuke Tsubaki
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
Keisuke Imoto
133
1
0
16 Mar 2024
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model
  for Multimodal Processing
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal ProcessingIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
Xianghu Yue
Xiaohai Tian
Lu Lu
Malu Zhang
Zhizheng Wu
Haizhou Li
146
1
0
22 Jan 2024
Audio Generation with Multiple Conditional Diffusion Model
Audio Generation with Multiple Conditional Diffusion ModelAAAI Conference on Artificial Intelligence (AAAI), 2023
Zhifang Guo
Jianguo Mao
Ruijie Tao
Long Yan
Kazushige Ouchi
Hong Liu
Xiangdong Wang
DiffM
164
25
0
23 Aug 2023
Exploring the Role of Audio in Video Captioning
Exploring the Role of Audio in Video Captioning
Yuhan Shen
Linjie Yang
Longyin Wen
Haichao Yu
Ehsan Elhamifar
Heng Wang
128
3
0
21 Jun 2023
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and
  Video Generation
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video GenerationComputer Vision and Pattern Recognition (CVPR), 2022
Ludan Ruan
Yi Ma
Huan Yang
Huiguo He
Bei Liu
Jianlong Fu
Nicholas Jing Yuan
Qin Jin
B. Guo
DiffMVGen
250
228
0
19 Dec 2022
1