Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2401.17690
Cited By

EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for
Automated Audio Captioning

EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

31 January 2024

Jaeyeon Kim

Jinjoo Lee

Sang Hoon Woo

ArXiv (abs)PDF HTML HuggingFace (5 upvotes)Github (91★)

Papers citing "EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning"

13 / 13 papers shown

SAR-LM: Symbolic Audio Reasoning with Large Language Models

SAR-LM: Symbolic Audio Reasoning with Large Language Models

Emmanouil Benetos

213

0

0

09 Nov 2025

Do Joint Language-Audio Embeddings Encode Perceptual Timbre Semantics?

Do Joint Language-Audio Embeddings Encode Perceptual Timbre Semantics?

Thrasyvoulos N Pappas

93

0

0

16 Oct 2025

Audio-Guided Visual Editing with Complex Multi-Modal Prompts

Audio-Guided Visual Editing with Complex Multi-Modal Prompts

109

0

0

28 Aug 2025

MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks

MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks

Heinrich Dinkel

234

3

0

31 Jul 2025

CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation

CatchPhrase: EXPrompt-Guided Encoder Adaptation for Audio-to-Image Generation

216

2

0

24 Jul 2025

MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation

MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation

202

2

0

20 Jun 2025

CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer

CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer

Binh Thien Nguyen

Masahiro Yasuda

Yasunori Ohishi

Daisuke Niizumi

189

2

0

01 Jun 2025

Audio-Language Models for Audio-Centric Tasks: A survey

341

15

0

28 Jan 2025

Audio-Language Datasets of Scenes and Events: A Survey

Audio-Language Datasets of Scenes and Events: A SurveyIEEE Access (IEEE Access), 2024

Michele Esposito

469

6

0

10 Jan 2025

Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning

Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio ReasoningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

380

18

0

03 Jan 2025

PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio
Classification

PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio ClassificationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Ramaneswaran Selvakumar

174

0

0

19 Oct 2024

DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning

DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio CaptioningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Ziyang Ma

321

16

0

12 Oct 2024

Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data

Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic DataInternational Conference on Learning Representations (ICLR), 2024

Bryan Catanzaro

Dinesh Manocha

415

6

0

02 Oct 2024

Page 1 of 1