ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.02216
  4. Cited By
Look\&Listen: Multi-Modal Correlation Learning for Active Speaker
  Detection and Speech Enhancement

Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement

4 March 2022
Jun Xiong
Yu Zhou
Peng Zhang
Lei Xie
Wei Huang
Yufei Zha
ArXivPDFHTML

Papers citing "Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement"

13 / 13 papers shown
Title
CLIP-VAD: Exploiting Vision-Language Models for Voice Activity Detection
CLIP-VAD: Exploiting Vision-Language Models for Voice Activity Detection
Andrea Appiani
Cigdem Beyan
CLIP
VLM
28
0
0
18 Oct 2024
Robust Active Speaker Detection in Noisy Environments
Robust Active Speaker Detection in Noisy Environments
Siva Sai Nagender Vasireddy
Chenxu Zhang
Xiaohu Guo
Yapeng Tian
27
0
0
27 Mar 2024
Audio-Visual Active Speaker Extraction for Sparsely Overlapped
  Multi-talker Speech
Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech
Jun Yu Li
Ruijie Tao
Zexu Pan
Meng Ge
Shuai Wang
Haizhou Li
13
5
0
15 Sep 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze
  Anticipation
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
19
8
0
06 May 2023
Egocentric Auditory Attention Localization in Conversations
Egocentric Auditory Attention Localization in Conversations
Fiona Ryan
Hao Jiang
Abhinav Shukla
James M. Rehg
V. Ithapu
EgoV
24
16
0
28 Mar 2023
CASP-Net: Rethinking Video Saliency Prediction from an
  Audio-VisualConsistency Perceptual Perspective
CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective
Jun Xiong
Gang Wang
Peng Zhang
Wei Huang
Yufei Zha
Guangtao Zhai
11
14
0
11 Mar 2023
A Light Weight Model for Active Speaker Detection
A Light Weight Model for Active Speaker Detection
Junhua Liao
Haihan Duan
Kanghui Feng
Wanbing Zhao
Yanbing Yang
Liangyin Chen
24
36
0
08 Mar 2023
LoCoNet: Long-Short Context Network for Active Speaker Detection
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang
Feng Cheng
Gedas Bertasius
David J. Crandall
19
14
0
19 Jan 2023
One-shot Talking Face Generation from Single-speaker Audio-Visual
  Correlation Learning
One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning
Suzhe Wang
Lincheng Li
Yueqing Ding
Xin Yu
CVBM
59
117
0
06 Dec 2021
The Right to Talk: An Audio-Visual Transformer Approach
The Right to Talk: An Audio-Visual Transformer Approach
Thanh-Dat Truong
C. Duong
T. D. Vu
H. Pham
Bhiksha Raj
Ngan Le
Khoa Luu
55
36
0
06 Aug 2021
MAAS: Multi-modal Assignation for Active Speaker Detection
MAAS: Multi-modal Assignation for Active Speaker Detection
Juan Carlos León Alcázar
Fabian Caba Heilbron
Ali K. Thabet
Bernard Ghanem
57
51
0
11 Jan 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
185
198
0
08 Jan 2021
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
214
2,224
0
14 Jun 2018
1