Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2102.01326
Cited By
Multimodal Attention Fusion for Target Speaker Extraction
Spoken Language Technology Workshop (SLT), 2021
2 February 2021
Hiroshi Sato
Tsubasa Ochiai
K. Kinoshita
Marc Delcroix
Tomohiro Nakatani
S. Araki
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Multimodal Attention Fusion for Target Speaker Extraction"
17 / 17 papers shown
Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device
Zixuan Li
Xueliang Zhang
Lei Miao
Zhipeng Yan
Ying Sun
Chong Zhu
168
0
0
28 May 2025
Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction
Zexu Pan
Shengkui Zhao
Tingting Wang
Kun Zhou
Yukun Ma
Chong Zhang
B. Ma
226
0
0
27 May 2025
Listen to Extract: Onset-Prompted Target Speaker Extraction
Pengjie Shen
Kangrui Chen
Shulin He
Pengru Chen
Shuqi Yuan
He Kong
Xueliang Zhang
Zehao Wang
317
2
0
08 May 2025
Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining
H. S. Bovbjerg
Jan Østergaard
Jesper Jensen
Zheng-Hua Tan
285
1
0
06 Jan 2025
Look Once to Hear: Target Speech Hearing with Noisy Examples
International Conference on Human Factors in Computing Systems (CHI), 2024
Bandhav Veluri
Malek Itani
Tuochao Chen
Takuya Yoshioka
Shyamnath Gollakota
333
32
0
10 May 2024
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Ruijie Tao
Xinyuan Qian
Yidi Jiang
Junjie Li
Jiadong Wang
Haizhou Li
320
3
0
29 Apr 2024
Conditional Diffusion Model for Target Speaker Extraction
Theodor Nguyen
Guangzhi Sun
Xianrui Zheng
Chao Zhang
0031 Philip C. Woodland
DiffM
217
4
0
07 Oct 2023
Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Guinan Li
Jiajun Deng
Mengzhe Geng
Zengrui Jin
Tianzi Wang
Shujie Hu
Mingyu Cui
Helen M. Meng
Xunying Liu
164
19
0
06 Jul 2023
AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jiuxin Lin
X. Cai
Heinrich Dinkel
Jun Chen
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Zhiyong Wu
Yujun Wang
Helen M. Meng
234
39
0
25 Jun 2023
Audio-Visual Speech Enhancement With Selective Off-Screen Speech Extraction
European Signal Processing Conference (EUSIPCO), 2023
Tomoya Yoshinaga
Keitaro Tanaka
Shigeo Morishima
190
1
0
10 Jun 2023
Neural Target Speech Extraction: An Overview
IEEE Signal Processing Magazine (IEEE Signal Process. Mag.), 2023
Kateřina Žmolíková
Marc Delcroix
Tsubasa Ochiai
K. Kinoshita
JanHonza'' vCernocký
Dong Yu
194
134
0
31 Jan 2023
Anchored Speech Recognition with Neural Transducers
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Desh Raj
Junteng Jia
Jay Mahadeokar
Chunyang Wu
Niko Moritz
Xiaohui Zhang
Ozlem Kalinli
240
2
0
20 Oct 2022
ConceptBeam: Concept Driven Target Speech Extraction
ACM Multimedia (ACM MM), 2022
Yasunori Ohishi
Marc Delcroix
Tsubasa Ochiai
S. Araki
Daiki Takeuchi
Daisuke Niizumi
Akisato Kimura
Noboru Harada
K. Kashino
165
23
0
25 Jul 2022
Dual-Path Cross-Modal Attention for better Audio-Visual Speech Extraction
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zhongweiyang Xu
Xulin Fan
M. Hasegawa-Johnson
141
3
0
09 Jul 2022
Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations
Interspeech (Interspeech), 2022
Hiroshi Sato
Tsubasa Ochiai
Marc Delcroix
K. Kinoshita
Takafumi Moriya
Naoki Makishima
Mana Ihori
Tomohiro Tanaka
Ryo Masumura
110
6
0
16 Jun 2022
VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
European Conference on Computer Vision (ECCV), 2022
Juan F. Montesinos
V. S. Kadandale
G. Haro
ViT
275
25
0
08 Mar 2022
USEV: Universal Speaker Extraction with Visual Cue
Zexu Pan
Meng Ge
Haizhou Li
259
55
0
30 Sep 2021
1