Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.14170
Cited By
AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction
25 June 2023
Jiuxin Lin
X. Cai
Heinrich Dinkel
Jun Chen
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Zhiyong Wu
Yujun Wang
Helen M. Meng
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction"
16 / 16 papers shown
Title
Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining
H. S. Bovbjerg
Jan Østergaard
Jesper Jensen
Zheng-Hua Tan
36
0
0
06 Jan 2025
Robust Audio-Visual Speech Enhancement: Correcting Misassignments in Complex Environments with Advanced Post-Processing
Wenze Ren
Kuo-Hsuan Hung
Rong-Yu Chao
YouJin Li
Hsin-Min Wang
Yu Tsao
15
0
0
22 Sep 2024
Cross-attention Inspired Selective State Space Models for Target Sound Extraction
Donghang Wu
Yiwen Wang
Xihong Wu
T. Qu
Mamba
23
3
0
07 Sep 2024
RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues
Tianrui Pan
Jie Liu
Bohan Wang
Jie Tang
Gangshan Wu
24
1
0
27 Jul 2024
Target conversation extraction: Source separation using turn-taking dynamics
Tuochao Chen
Qirui Wang
Bohan Wu
Malek Itani
Sefik Emre Eskimez
Takuya Yoshioka
Shyamnath Gollakota
20
4
0
15 Jul 2024
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling
Vahid Ahmadi Kalkhorani
Cheng Yu
Anurag Kumar
Ke Tan
Buye Xu
DeLiang Wang
26
0
0
17 Jun 2024
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Ruijie Tao
Xinyuan Qian
Yidi Jiang
Junjie Li
Jiadong Wang
Haizhou Li
19
1
0
29 Apr 2024
Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction
Zhaoxi Mu
Xinyu Yang
18
5
0
19 Apr 2024
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down Fusion
Samuel Pegg
Kai Li
Xiaolin Hu
18
1
0
25 Jan 2024
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction
Xiang Hao
Jibin Wu
Jianwei Yu
Chenglin Xu
Kay Chen Tan
13
10
0
11 Oct 2023
RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation
Samuel Pegg
Kai Li
Xiaolin Hu
8
4
0
29 Sep 2023
Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech
Jun Yu Li
Ruijie Tao
Zexu Pan
Meng Ge
Shuai Wang
Haizhou Li
10
5
0
15 Sep 2023
VoxBlink: A Large Scale Speaker Verification Dataset on Camera
Yuke Lin
Xiaoyi Qin
Guoqing Zhao
Ming Cheng
Ning Jiang
Haiying Wu
Ming Li
30
13
0
14 Aug 2023
TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation
Zhong-Qiu Wang
Samuele Cornell
Shukjae Choi
Younglo Lee
Byeonghak Kim
Shinji Watanabe
53
95
0
08 Sep 2022
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
182
196
0
08 Jan 2021
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
212
1,954
0
14 Jun 2018
1