Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.06687
Cited By
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
12 November 2022
Yusong Wu
K. Chen
Tianyu Zhang
Yuchen Hui
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation"
43 / 343 papers shown
Title
Natural Language Supervision for General-Purpose Audio Representations
Benjamin Elizalde
Soham Deshmukh
Huaming Wang
AuLLM
AI4TS
13
53
0
11 Sep 2023
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
Jinzuomu Zhong
Yang Li
Hui Huang
Korin Richmond
Jie Liu
Zhiba Su
Jing Guo
Benlai Tang
Fengjie Zhu
8
1
0
11 Sep 2023
Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation
A. Sridhar
Yinyi Guo
Erik M. Visser
Rehana Mahfuz
24
5
0
06 Sep 2023
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Erik Cambria
Björn W. Schuller
LM&MA
AuLLM
29
36
0
24 Aug 2023
Emotion-Aligned Contrastive Learning Between Images and Music
Shanti Stewart
Kleanthis Avramidis
Tiantian Feng
Shrikanth Narayanan
14
0
0
24 Aug 2023
A Survey of AI Music Generation Tools and Models
Yueyue Zhu
Jared Baca
Banafsheh Rekabdar
Reza Rawassizadeh
MGen
22
13
0
24 Aug 2023
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning
Shansong Liu
Atin Sakkeer Hussain
Chenshuo Sun
Yin Shan
MLLM
24
27
0
22 Aug 2023
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang
Jianbo Ma
Santiago Pascual
Richard Cartwright
Weidong (Tom) Cai
VGen
11
37
0
18 Aug 2023
Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries
J. Wilkins
Justin Salamon
Magdalena Fuentes
J. P. Bello
Oriol Nieto
CLIP
11
5
0
17 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
17
218
0
10 Aug 2023
Separate Anything You Describe
Xubo Liu
Qiuqiang Kong
Yan Zhao
Haohe Liu
Yiitan Yuan
Yuzhuo Liu
Rui Xia
Yuxuan Wang
Mark D. Plumbley
Wenwu Wang
VLM
19
43
0
09 Aug 2023
Transferable Models for Bioacoustics with Human Language Supervision
David Robinson
Adelaide Robinson
Lily Akrapongpisak
6
8
0
09 Aug 2023
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies
K. Chen
Yusong Wu
Haohe Liu
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
DiffM
17
72
0
03 Aug 2023
LP-MusicCaps: LLM-Based Pseudo Music Captioning
Seungheon Doh
Keunwoo Choi
Jongpil Lee
Juhan Nam
27
70
0
31 Jul 2023
A Demand-Driven Perspective on Generative Audio AI
Sangshin Oh
Minsung Kang
Hyeongi Moon
Keunwoo Choi
Ben Sangbae Chon
17
3
0
10 Jul 2023
DISCO-10M: A Large-Scale Music Dataset
Luca A. Lanzendörfer
Florian Grötschla
Emil Funke
Roger Wattenhofer
14
12
0
23 Jun 2023
A Multimodal Prototypical Approach for Unsupervised Sound Classification
Saksham Singh Kushwaha
Magdalena Fuentes
14
8
0
21 Jun 2023
Text-Driven Foley Sound Generation With Latent Diffusion Model
Yiitan Yuan
Haohe Liu
Xubo Liu
Xiyuan Kang
Peipei Wu
Mark D.Plumbley
Wenwu Wang
DiffM
30
10
0
17 Jun 2023
FALL-E: A Foley Sound Synthesis Model and Strategies
Minsung Kang
Sangshin Oh
Hyeongi Moon
Kyungyun Lee
Ben Sangbae Chon
23
4
0
16 Jun 2023
CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models
Hao-Wen Dong
Xiaoyu Liu
Jordi Pons
Gautam Bhattacharya
Santiago Pascual
Joan Serra
Taylor Berg-Kirkpatrick
Julian McAuley
DiffM
11
19
0
16 Jun 2023
GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition
Y. Pan
Yanni Hu
Yuguang Yang
Wen Fei
Jixun Yao
Heng Lu
Lei Ma
Jianjun Zhao
VLM
51
8
0
13 Jun 2023
Simple and Controllable Music Generation
Jade Copet
Felix Kreuk
Itai Gat
Tal Remez
David Kant
Gabriel Synnaeve
Yossi Adi
Alexandre Défossez
MGen
19
337
0
08 Jun 2023
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
Yingying Fan
Yu Wu
Bo Du
Yutian Lin
26
7
0
01 Jun 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
J. Liu
30
95
0
29 May 2023
Adapting Language-Audio Models as Few-Shot Audio Learners
Jinhua Liang
Xubo Liu
Haohe Liu
Huy P Phan
Emmanouil Benetos
Mark D. Plumbley
Wenwu Wang
VLM
17
19
0
28 May 2023
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Yun-hsuan Lai
Yen-Chun Chen
Y. Wang
8
8
0
27 May 2023
Latent Diffusion Model Based Foley Sound Generation System For DCASE Challenge 2023 Task 7
Yiitan Yuan
Haohe Liu
Xubo Liu
Xiyuan Kang
Mark D.Plumbley
Wenwu Wang
12
9
0
25 May 2023
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo
Jing Shi
Yapeng Tian
13
17
0
22 May 2023
Connecting Multi-modal Contrastive Representations
Zehan Wang
Yang Zhao
Xize Cheng
Haifeng Huang
Jiageng Liu
...
Lin Li
Yongqiang Wang
Aoxiong Yin
Ziang Zhang
Zhou Zhao
17
22
0
22 May 2023
Pengi: An Audio Language Model for Audio Tasks
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLM
AuLLM
30
155
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
16
113
0
18 May 2023
Listen, Think, and Understand
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELM
MLLM
LRM
29
133
0
18 May 2023
Unsupervised Improvement of Audio-Text Cross-Modal Representations
Zhepei Wang
Cem Subakan
Krishna Subramani
Junkai Wu
Tiago Tavares
Fabio Ayres
Paris Smaragdis
SSL
17
3
0
03 May 2023
CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval
Shangda Wu
Dingyao Yu
Xu Tan
Maosong Sun
CLIP
VLM
5
13
0
21 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
23
2
0
12 Apr 2023
On Robustness in Multimodal Learning
Brandon McKinzie
Joseph Cheng
Vaishaal Shankar
Yinfei Yang
Jonathon Shlens
Alexander Toshev
25
2
0
10 Apr 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
43
190
0
30 Mar 2023
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Haohe Liu
Zehua Chen
Yiitan Yuan
Xinhao Mei
Xubo Liu
Danilo P. Mandic
Wenwu Wang
Mark D. Plumbley
DiffM
27
471
0
29 Jan 2023
Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
Flavio Schneider
Ojasv Kamal
Zhijing Jin
Bernhard Schölkopf
MGen
24
83
0
27 Jan 2023
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
19
50
0
15 Dec 2022
TimbreCLIP: Connecting Timbre to Text and Images
Nicolas Jonason
Bob L. T. Sturm
CLIP
22
4
0
21 Nov 2022
Audio Retrieval with WavText5K and CLAP Training
Soham Deshmukh
Benjamin Elizalde
Huaming Wang
3DV
CLIP
113
50
0
28 Sep 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
114
262
0
02 Feb 2022
Previous
1
2
3
4
5
6
7