ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.17690
  4. Cited By
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for
  Automated Audio Captioning

EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

31 January 2024
Jaeyeon Kim
Jaeyoon Jung
Jinjoo Lee
Sang Hoon Woo
    CLIPVLM
ArXiv (abs)PDFHTMLHuggingFace (5 upvotes)Github (91★)

Papers citing "EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning"

12 / 12 papers shown
Title
SAR-LM: Symbolic Audio Reasoning with Large Language Models
SAR-LM: Symbolic Audio Reasoning with Large Language Models
Termeh Taheri
Yinghao Ma
Emmanouil Benetos
AuLLMLRM
122
0
0
09 Nov 2025
Do Joint Language-Audio Embeddings Encode Perceptual Timbre Semantics?
Do Joint Language-Audio Embeddings Encode Perceptual Timbre Semantics?
Qixin Deng
Bryan Pardo
Thrasyvoulos N Pappas
60
0
0
16 Oct 2025
Audio-Guided Visual Editing with Complex Multi-Modal Prompts
Audio-Guided Visual Editing with Complex Multi-Modal Prompts
Hyeonyu Kim
Seokhoon Jeong
Seonghee Han
Chanhyuk Choi
Taehwan Kim
DiffM
49
0
0
28 Aug 2025
MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks
MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks
Yadong Niu
Tianzi Wang
Heinrich Dinkel
Xingwei Sun
Jiahao Zhou
Gang Li
Jizhong Liu
Xunying Liu
Junbo Zhang
Jian Luan
AuLLM
154
2
0
31 Jul 2025
CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation
CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation
Hyunwoo Oh
SeungJu Cha
Kwanyoung Lee
Si-Woo Kim
Dong-Jin Kim
164
2
0
24 Jul 2025
MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
Shoubin Yu
Yue Zhang
Ziyang Wang
Jaehong Yoon
Mohit Bansal
MoELRM
117
3
0
20 Jun 2025
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
Daiki Takeuchi
Binh Thien Nguyen
Masahiro Yasuda
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
VLM
140
1
0
01 Jun 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
294
14
0
28 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A SurveyIEEE Access (IEEE Access), 2024
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
354
6
0
10 Jan 2025
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio ReasoningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Chun-Yi Kuan
Hung-yi Lee
AuLLMLRM
270
16
0
03 Jan 2025
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio CaptioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Xiquan Li
Wenxi Chen
Ziyang Ma
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Qiuqiang Kong
Xie Chen
VLM
271
12
0
12 Oct 2024
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic DataInternational Conference on Learning Representations (ICLR), 2024
Sreyan Ghosh
Sonal Kumar
Zhifeng Kong
Rafael Valle
Bryan Catanzaro
Dinesh Manocha
DiffM
334
5
0
02 Oct 2024
1