ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.06687
  4. Cited By
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion
  and Keyword-to-Caption Augmentation

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

12 November 2022
Yusong Wu
K. Chen
Tianyu Zhang
Yuchen Hui
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
    CLIP
ArXivPDFHTML

Papers citing "Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation"

50 / 343 papers shown
Title
Oceanship: A Large-Scale Dataset for Underwater Audio Target Recognition
Oceanship: A Large-Scale Dataset for Underwater Audio Target Recognition
Zeyu Li
Suncheng Xiang
Tong Yu
Jingsheng Gao
Jiacheng Ruan
Yanping Hu
Ting Liu
Yuzhuo Fu
12
0
0
04 Jan 2024
Audiobox: Unified Audio Generation with Natural Language Prompts
Audiobox: Unified Audio Generation with Natural Language Prompts
Apoorv Vyas
Bowen Shi
Matt Le
Andros Tjandra
Yi-Chiao Wu
...
Chris Summers
Carleigh Wood
Joshua Lane
Mary Williamson
Wei-Ning Hsu
33
74
0
25 Dec 2023
A Language-based solution to enable Metaverse Retrieval
A Language-based solution to enable Metaverse Retrieval
Ali Abdari
Alex Falcon
Giuseppe Serra
DiffM
11
4
0
22 Dec 2023
SECap: Speech Emotion Captioning with Large Language Model
SECap: Speech Emotion Captioning with Large Language Model
Yaoxun Xu
Hangting Chen
Jianwei Yu
Qiaochu Huang
Zhiyong Wu
Shixiong Zhang
Guangzhi Li
Yi Luo
Rongzhi Gu
12
22
0
16 Dec 2023
Data-Efficient Multimodal Fusion on a Single GPU
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis
Zhaoyan Liu
S. Gorti
Valentin Villecroze
Jesse C. Cresswell
Guangwei Yu
G. Loaiza-Ganem
M. Volkovs
35
3
0
15 Dec 2023
WikiMuTe: A web-sourced dataset of semantic descriptions for music audio
WikiMuTe: A web-sourced dataset of semantic descriptions for music audio
Benno Weck
Holger Kirchhoff
Peter Grosche
Xavier Serra
VLM
8
2
0
14 Dec 2023
Audio-Visual LLM for Video Understanding
Audio-Visual LLM for Video Understanding
Fangxun Shu
Lei Zhang
Hao Jiang
Cihang Xie
VLM
MLLM
17
36
0
11 Dec 2023
Speaker-Text Retrieval via Contrastive Learning
Speaker-Text Retrieval via Contrastive Learning
Xuechen Liu
Xin Wang
Erica Cooper
Xiaoxiao Miao
Junichi Yamagishi
VLM
14
0
0
11 Dec 2023
SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and
  Exploration
SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and Exploration
Stephen Brade
Bryan Wang
Maurício Sousa
Gregory Lee Newsome
Sageev Oore
Tovi Grossman
13
1
0
07 Dec 2023
C3Net: Compound Conditioned ControlNet for Multimodal Content Generation
C3Net: Compound Conditioned ControlNet for Multimodal Content Generation
Juntao Zhang
Yuehuai Liu
Yu-Wing Tai
Chi-Keung Tang
DiffM
30
4
0
29 Nov 2023
ViT-Lens: Towards Omni-modal Representations
ViT-Lens: Towards Omni-modal Representations
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
21
18
0
27 Nov 2023
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
Zhixi Cai
Shreya Ghosh
Aman Pankaj Adatia
Munawar Hayat
Abhinav Dhall
Kalin Stefanov
11
26
0
26 Nov 2023
PortfolioMentor: Multimodal Generative AI Companion for Learning and
  Crafting Interactive Digital Art Portfolios
PortfolioMentor: Multimodal Generative AI Companion for Learning and Crafting Interactive Digital Art Portfolios
Tao Long
Weirui Peng
24
1
0
23 Nov 2023
Boosting Audio-visual Zero-shot Learning with Large Language Models
Boosting Audio-visual Zero-shot Learning with Large Language Models
Haoxing Chen
Yaohui Li
Yan Hong
Zizheng Huang
Zhuoer Xu
Zhangxuan Gu
Jun Lan
Huijia Zhu
Weiqiang Wang
VLM
24
1
0
21 Nov 2023
A Study on Altering the Latent Space of Pretrained Text to Speech Models
  for Improved Expressiveness
A Study on Altering the Latent Space of Pretrained Text to Speech Models for Improved Expressiveness
Mathias Vogel
DiffM
21
0
0
17 Nov 2023
The Song Describer Dataset: a Corpus of Audio Captions for
  Music-and-Language Evaluation
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation
Ilaria Manco
Benno Weck
Seungheon Doh
Minz Won
Yixiao Zhang
...
Philip Tovstogan
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
Juhan Nam
19
25
0
16 Nov 2023
EDMSound: Spectrogram Based Diffusion Models for Efficient and
  High-Quality Audio Synthesis
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
Ge Zhu
Yutong Wen
M. Carbonneau
Zhiyao Duan
DiffM
38
7
0
15 Nov 2023
Zero-shot audio captioning with audio-language model guidance and audio
  context keywords
Zero-shot audio captioning with audio-language model guidance and audio context keywords
Leonard Salewski
Stefan Fauth
A. Sophia Koepke
Zeynep Akata
16
10
0
14 Nov 2023
The taste of IPA: Towards open-vocabulary keyword spotting and forced
  alignment in any language
The taste of IPA: Towards open-vocabulary keyword spotting and forced alignment in any language
Jian Zhu
Changbing Yang
Farhan Samir
Jahurul Islam
25
4
0
14 Nov 2023
Music ControlNet: Multiple Time-varying Controls for Music Generation
Music ControlNet: Multiple Time-varying Controls for Music Generation
Shih-Lun Wu
Chris Donahue
Shinji Watanabe
Nicholas J. Bryan
DiffM
MGen
13
48
0
13 Nov 2023
InstrumentGen: Generating Sample-Based Musical Instruments From Text
InstrumentGen: Generating Sample-Based Musical Instruments From Text
S. Nercessian
Johannes Imort
9
2
0
07 Nov 2023
FLAP: Fast Language-Audio Pre-training
FLAP: Fast Language-Audio Pre-training
Ching-Feng Yeh
Po-Yao Huang
Vasu Sharma
Shang-Wen Li
Gargi Ghosh
CLIP
VLM
20
8
0
02 Nov 2023
In-Context Prompt Editing For Conditional Audio Generation
In-Context Prompt Editing For Conditional Audio Generation
Ernie Chang
Pin-Jie Lin
Yang Li
Sidd Srinivasan
Gaël Le Lan
David Kant
Yangyang Shi
Forrest N. Iandola
Vikas Chandra
DiffM
19
3
0
01 Nov 2023
Audio-Visual Instance Segmentation
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLM
VOS
26
2
0
28 Oct 2023
Content-based Controls For Music Large Language Modeling
Content-based Controls For Music Large Language Modeling
Liwei Lin
Gus Xia
Junyan Jiang
Yixiao Zhang
11
14
0
26 Oct 2023
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Daniela Ben-David
Tzuf Paz-Argaman
Reut Tsarfaty
MoE
21
0
0
25 Oct 2023
On the Language Encoder of Contrastive Cross-modal Models
On the Language Encoder of Contrastive Cross-modal Models
Mengjie Zhao
Junya Ono
Zhi-Wei Zhong
Chieh-Hsin Lai
Yuhta Takida
Naoki Murata
Wei-Hsiang Liao
Takashi Shibuya
Hiromi Wakaki
Yuki Mitsufuji
VLM
28
0
0
20 Oct 2023
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative
  Editing
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
Yixiao Zhang
Akira Maezawa
Gus Xia
Kazuhiko Yamamoto
Simon Dixon
44
15
0
19 Oct 2023
High-Fidelity Noise Reduction with Differentiable Signal Processing
High-Fidelity Noise Reduction with Differentiable Signal Processing
C. Steinmetz
Thomas Walther
Joshua D. Reiss
14
3
0
17 Oct 2023
Generation or Replication: Auscultating Audio Latent Diffusion Models
Generation or Replication: Auscultating Audio Latent Diffusion Models
Dimitrios Bralios
G. Wichern
François G. Germain
Zexu Pan
Sameer Khurana
Chiori Hori
Jonathan Le Roux
DiffM
8
6
0
16 Oct 2023
Extending Multi-modal Contrastive Representations
Extending Multi-modal Contrastive Representations
Zehan Wang
Ziang Zhang
Luping Liu
Yang Zhao
Haifeng Huang
Tao Jin
Zhou Zhao
19
5
0
13 Oct 2023
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language
  Models
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Sreyan Ghosh
Ashish Seth
Sonal Kumar
Utkarsh Tyagi
Chandra Kiran Reddy Evuru
S. Ramaneswaran
S. Sakshi
Oriol Nieto
R. Duraiswami
Dinesh Manocha
AuLLM
VLM
CoGe
30
21
0
12 Oct 2023
LLark: A Multimodal Instruction-Following Language Model for Music
LLark: A Multimodal Instruction-Following Language Model for Music
Josh Gardner
Simon Durand
Daniel Stoller
Rachel M. Bittner
AuLLM
20
14
0
11 Oct 2023
LanguageBind: Extending Video-Language Pretraining to N-modality by
  Language-based Semantic Alignment
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Bin Zhu
Bin Lin
Munan Ning
Yang Yan
Jiaxi Cui
...
Zongwei Li
Wancai Zhang
Zhifeng Li
Wei Liu
Liejie Yuan
VLM
MLLM
15
200
0
03 Oct 2023
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Avamarie Brueggeman
Andrea Madotto
Zhaojiang Lin
Tushar Nagarajan
Matt Smith
...
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
MLLM
24
92
0
27 Sep 2023
Online Active Learning For Sound Event Detection
Online Active Learning For Sound Event Detection
Mark Lindsey
Ankit Shah
Francis Kubala
R. M. Stern
17
0
0
25 Sep 2023
VoiceLDM: Text-to-Speech with Environmental Context
VoiceLDM: Text-to-Speech with Environmental Context
Yeong-Won Lee
In-won Yeon
Juhan Nam
Joon Son Chung
VLM
DiffM
8
10
0
24 Sep 2023
Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics
  Description for Prompt-based Control
Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
Aya Watanabe
Shinnosuke Takamichi
Yuki Saito
Wataru Nakata
Detai Xin
Hiroshi Saruwatari
10
9
0
24 Sep 2023
Weakly-supervised Automated Audio Captioning via text only training
Weakly-supervised Automated Audio Captioning via text only training
Theodoros Kouzelis
V. Katsouros
CLIP
25
6
0
21 Sep 2023
A Large-scale Dataset for Audio-Language Representation Learning
A Large-scale Dataset for Audio-Language Representation Learning
Luoyi Sun
Xuenan Xu
Mengyue Wu
Weidi Xie
18
20
0
20 Sep 2023
Investigating Personalization Methods in Text to Music Generation
Investigating Personalization Methods in Text to Music Generation
Manos Plitsis
Theodoros Kouzelis
Georgios Paraskevopoulos
V. Katsouros
Yannis Panagakis
DiffM
17
10
0
20 Sep 2023
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation
  with Consistency Distillation
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
Yatong Bai
Trung D. Q. Dang
Dung N. Tran
K. Koishida
Somayeh Sojoudi
DiffM
31
22
0
19 Sep 2023
Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping
Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping
Subash Khanal
S. Sastry
A. Dhakal
Nathan Jacobs
28
8
0
19 Sep 2023
RECAP: Retrieval-Augmented Audio Captioning
RECAP: Retrieval-Augmented Audio Captioning
Sreyan Ghosh
Sonal Kumar
Chandra Kiran Reddy Evuru
R. Duraiswami
Dinesh Manocha
VLM
62
17
0
18 Sep 2023
Zero- and Few-shot Sound Event Localization and Detection
Zero- and Few-shot Sound Event Localization and Detection
Kazuki Shimada
Kengo Uchida
Yuichiro Koyama
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
Tatsuya Kawahara
20
4
0
17 Sep 2023
MusiLingo: Bridging Music and Text with Pre-trained Language Models for
  Music Captioning and Query Response
MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
Zihao Deng
Yi Ma
Yudong Liu
Rongchen Guo
Ge Zhang
Wenhu Chen
Wenhao Huang
Emmanouil Benetos
MLLM
AuLLM
26
17
0
15 Sep 2023
Audio-free Prompt Tuning for Language-Audio Models
Audio-free Prompt Tuning for Language-Audio Models
Yiming Li
Xiangdong Wang
Hong Liu
CLIP
VLM
6
9
0
15 Sep 2023
Retrieval-Augmented Text-to-Audio Generation
Retrieval-Augmented Text-to-Audio Generation
Yiitan Yuan
Haohe Liu
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
RALM
10
24
0
14 Sep 2023
Training Audio Captioning Models without Audio
Training Audio Captioning Models without Audio
Soham Deshmukh
Benjamin Elizalde
Dimitra Emmanouilidou
Bhiksha Raj
Rita Singh
Huaming Wang
19
18
0
14 Sep 2023
Diffusion models for audio semantic communication
Diffusion models for audio semantic communication
Eleonora Grassucci
Christian Marinoni
Andrea Rodriguez
Danilo Comminiello
DiffM
11
23
0
13 Sep 2023
Previous
1234567
Next