Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.12725
Cited By
Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction
19 April 2024
Zhaoxi Mu
Xinyu Yang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction"
7 / 7 papers shown
Title
SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation
Zhaoxi Mu
Xinyu Yang
Gang Wang
AuLLM
KELM
VLM
53
0
0
06 May 2025
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization
Detao Bai
Zhiheng Ma
Xihan Wei
Liefeng Bo
40
0
0
06 May 2025
Distance Based Single-Channel Target Speech Extraction
Runwu Shi
Benjamin Yen
Kazuhiro Nakadai
23
0
0
31 Dec 2024
Cross-attention Inspired Selective State Space Models for Target Sound Extraction
Donghang Wu
Yiwen Wang
Xihong Wu
T. Qu
Mamba
26
3
0
07 Sep 2024
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling
Vahid Ahmadi Kalkhorani
Cheng Yu
Anurag Kumar
Ke Tan
Buye Xu
DeLiang Wang
29
0
0
17 Jun 2024
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
185
196
0
08 Jan 2021
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
214
1,954
0
14 Jun 2018
1