ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.01326
  4. Cited By
Multimodal Attention Fusion for Target Speaker Extraction

Multimodal Attention Fusion for Target Speaker Extraction

Spoken Language Technology Workshop (SLT), 2021
2 February 2021
Hiroshi Sato
Tsubasa Ochiai
K. Kinoshita
Marc Delcroix
Tomohiro Nakatani
S. Araki
ArXiv (abs)PDFHTML

Papers citing "Multimodal Attention Fusion for Target Speaker Extraction"

17 / 17 papers shown
Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device
Two-stage Audio-Visual Target Speaker Extraction System for Real-Time Processing On Edge Device
Zixuan Li
Xueliang Zhang
Lei Miao
Zhipeng Yan
Ying Sun
Chong Zhu
168
0
0
28 May 2025
Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction
Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction
Zexu Pan
Shengkui Zhao
Tingting Wang
Kun Zhou
Yukun Ma
Chong Zhang
B. Ma
226
0
0
27 May 2025
Listen to Extract: Onset-Prompted Target Speaker Extraction
Listen to Extract: Onset-Prompted Target Speaker Extraction
Pengjie Shen
Kangrui Chen
Shulin He
Pengru Chen
Shuqi Yuan
He Kong
Xueliang Zhang
Zehao Wang
317
2
0
08 May 2025
Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining
Noise-Robust Target-Speaker Voice Activity Detection Through Self-Supervised Pretraining
H. S. Bovbjerg
Jan Østergaard
Jesper Jensen
Zheng-Hua Tan
285
1
0
06 Jan 2025
Look Once to Hear: Target Speech Hearing with Noisy Examples
Look Once to Hear: Target Speech Hearing with Noisy ExamplesInternational Conference on Human Factors in Computing Systems (CHI), 2024
Bandhav Veluri
Malek Itani
Tuochao Chen
Takuya Yoshioka
Shyamnath Gollakota
333
32
0
10 May 2024
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Ruijie Tao
Xinyuan Qian
Yidi Jiang
Junjie Li
Jiadong Wang
Haizhou Li
320
3
0
29 Apr 2024
Conditional Diffusion Model for Target Speaker Extraction
Conditional Diffusion Model for Target Speaker Extraction
Theodor Nguyen
Guangzhi Sun
Xianrui Zheng
Chao Zhang
0031 Philip C. Woodland
DiffM
217
4
0
07 Oct 2023
Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation
  and Recognition
Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Guinan Li
Jiajun Deng
Mengzhe Geng
Zengrui Jin
Tianzi Wang
Shujie Hu
Mingyu Cui
Helen M. Meng
Xunying Liu
164
19
0
06 Jul 2023
AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker
  Extraction
AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker ExtractionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jiuxin Lin
X. Cai
Heinrich Dinkel
Jun Chen
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Zhiyong Wu
Yujun Wang
Helen M. Meng
234
39
0
25 Jun 2023
Audio-Visual Speech Enhancement With Selective Off-Screen Speech
  Extraction
Audio-Visual Speech Enhancement With Selective Off-Screen Speech ExtractionEuropean Signal Processing Conference (EUSIPCO), 2023
Tomoya Yoshinaga
Keitaro Tanaka
Shigeo Morishima
190
1
0
10 Jun 2023
Neural Target Speech Extraction: An Overview
Neural Target Speech Extraction: An OverviewIEEE Signal Processing Magazine (IEEE Signal Process. Mag.), 2023
Kateřina Žmolíková
Marc Delcroix
Tsubasa Ochiai
K. Kinoshita
JanHonza'' vCernocký
Dong Yu
194
134
0
31 Jan 2023
Anchored Speech Recognition with Neural Transducers
Anchored Speech Recognition with Neural TransducersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Desh Raj
Junteng Jia
Jay Mahadeokar
Chunyang Wu
Niko Moritz
Xiaohui Zhang
Ozlem Kalinli
240
2
0
20 Oct 2022
ConceptBeam: Concept Driven Target Speech Extraction
ConceptBeam: Concept Driven Target Speech ExtractionACM Multimedia (ACM MM), 2022
Yasunori Ohishi
Marc Delcroix
Tsubasa Ochiai
S. Araki
Daiki Takeuchi
Daisuke Niizumi
Akisato Kimura
Noboru Harada
K. Kashino
165
23
0
25 Jul 2022
Dual-Path Cross-Modal Attention for better Audio-Visual Speech
  Extraction
Dual-Path Cross-Modal Attention for better Audio-Visual Speech ExtractionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zhongweiyang Xu
Xulin Fan
M. Hasegawa-Johnson
141
3
0
09 Jul 2022
Strategies to Improve Robustness of Target Speech Extraction to
  Enrollment Variations
Strategies to Improve Robustness of Target Speech Extraction to Enrollment VariationsInterspeech (Interspeech), 2022
Hiroshi Sato
Tsubasa Ochiai
Marc Delcroix
K. Kinoshita
Takafumi Moriya
Naoki Makishima
Mana Ihori
Tomohiro Tanaka
Ryo Masumura
110
6
0
16 Jun 2022
VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
VoViT: Low Latency Graph-based Audio-Visual Voice Separation TransformerEuropean Conference on Computer Vision (ECCV), 2022
Juan F. Montesinos
V. S. Kadandale
G. Haro
ViT
275
25
0
08 Mar 2022
USEV: Universal Speaker Extraction with Visual Cue
USEV: Universal Speaker Extraction with Visual Cue
Zexu Pan
Meng Ge
Haizhou Li
259
55
0
30 Sep 2021
1