ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.11474
  4. Cited By
Text-to-Audio Grounding: Building Correspondence Between Captions and
  Sound Events

Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events

23 February 2021
Xuenan Xu
Heinrich Dinkel
Mengyue Wu
Kai Yu
ArXiv (abs)PDFHTML

Papers citing "Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events"

17 / 17 papers shown
Title
FLAM: Frame-Wise Language-Audio Modeling
FLAM: Frame-Wise Language-Audio Modeling
Yusong Wu
Christos Tsirigotis
Ke Chen
Cheng-Zhi Anna Huang
Rameswar Panda
Oriol Nieto
Prem Seetharaman
Justin Salamon
89
1
0
08 May 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
164
4
0
28 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
187
3
0
10 Jan 2025
Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation
  Under Semantic Guidance
Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance
Yaoyun Zhang
Xuenan Xu
Mengyue Wu
VGen
99
1
0
24 Dec 2024
Dissecting Temporal Understanding in Text-to-Audio Retrieval
Dissecting Temporal Understanding in Text-to-Audio Retrieval
Andreea-Maria Oncescu
João F. Henriques
A. Sophia Koepke
91
2
0
01 Sep 2024
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of
  Audio Events in Text-to-audio Generation
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Zeyu Xie
Xuenan Xu
Zhizheng Wu
Mengyue Wu
89
8
0
03 Jul 2024
MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley
  Audio Content Planning and Generation
MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation
Ruibo Fu
Shuchen Shi
Hongming Guo
Tao Wang
Chunyu Qiang
...
Zhiyong Wang
Yukun Liu
Xuefei Liu
Shuai Zhang
Guanjun Li
VGen
45
0
0
15 Jun 2024
AIR-Bench: Benchmarking Large Audio-Language Models via Generative
  Comprehension
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Qian Yang
Jin Xu
Wenrui Liu
Yunfei Chu
Ziyue Jiang
...
Yichong Leng
Yuanjun Lv
Zhou Zhao
Chang Zhou
Jingren Zhou
LM&MAAuLLMALM
108
85
0
12 Feb 2024
Towards Weakly Supervised Text-to-Audio Grounding
Towards Weakly Supervised Text-to-Audio Grounding
Xuenan Xu
Ziyang Ma
Mengyue Wu
Kai Yu
AI4TS
81
9
0
05 Jan 2024
DiffSED: Sound Event Detection with Denoising Diffusion
DiffSED: Sound Event Detection with Denoising Diffusion
Swapnil Bhosale
Sauradip Nag
Diptesh Kanojia
Jiankang Deng
Xiatian Zhu
DiffM
91
8
0
14 Aug 2023
Large Language Models are Few-Shot Health Learners
Large Language Models are Few-Shot Health Learners
Xin Liu
Daniel J. McDuff
G. Kovács
I. Galatzer-Levy
Jacob Sunshine
Jiening Zhan
M. Poh
Shun Liao
P. Achille
Shwetak N. Patel
LM&MAAI4MH
132
116
0
24 May 2023
Text-to-Audio Grounding Based Novel Metric for Evaluating Audio Caption
  Similarity
Text-to-Audio Grounding Based Novel Metric for Evaluating Audio Caption Similarity
Swapnil Bhosale
Rupayan Chakraborty
Sunil Kumar Kopparapu
65
1
0
03 Oct 2022
Automated Audio Captioning and Language-Based Audio Retrieval
Automated Audio Captioning and Language-Based Audio Retrieval
Clive Gomes
Hyejin Park
Patrick Kollman
Yi-Zhe Song
Iffanice Houndayi
Ankit Parag Shah
118
1
0
08 Jul 2022
Language-Based Audio Retrieval with Converging Tied Layers and
  Contrastive Loss
Language-Based Audio Retrieval with Converging Tied Layers and Contrastive Loss
Andrew Koh
Chng Eng Siong
144
1
0
29 Jun 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges
  in Audio Captioning
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
84
16
0
11 May 2022
Audio Retrieval with Natural Language Queries: A Benchmark Study
Audio Retrieval with Natural Language Queries: A Benchmark Study
A. Sophia Koepke
Andreea-Maria Oncescu
João F. Henriques
Zeynep Akata
Samuel Albanie
78
102
0
17 Dec 2021
Listen As You Wish: Audio based Event Detection via Text-to-Audio
  Grounding in Smart Cities
Listen As You Wish: Audio based Event Detection via Text-to-Audio Grounding in Smart Cities
Haoyu Tang
Yunxiao Wang
Jihua Zhu
Shuai Zhang
Mingzhu Xu
Qinghai Zheng
Yupeng Hu
125
1
0
27 Jun 2021
1