On Learning Associations of Faces and Voices

15 May 2018

Changil Kim

Hijung Valentina Shin

Papers citing "On Learning Associations of Faces and Voices"

21 / 21 papers shown

Title
Hear Your Face: Face-based voice conversion with F0 estimation Jaejun Lee Yoori Oh Injune Hwang Kyogu Lee CVBM 29 1 0 19 Aug 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning Shuai Wang Zheng-Shou Chen Kong Aik Lee Yan-min Qian Haizhou Li 39 4 0 21 Jul 2024
Audio-Visual Speaker Verification via Joint Cross-Attention R Gnana Praveen Jahangir Alam 31 6 0 28 Sep 2023
Speaker Recognition in Realistic Scenario Using Multimodal Data Saqlain Hussain Shah M. S. Saeed Shah Nawaz Muhammad Haroon Yousaf CVBM 26 8 0 25 Feb 2023
Learning Branched Fusion and Orthogonal Projection for Face-Voice Association M. S. Saeed Shah Nawaz M. H. Khan S. Javed Muhammad Haroon Yousaf Alessio Del Bue CVBM 27 4 0 22 Aug 2022
Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors Sindhu B. Hegde Rudrabha Mukhopadhyay Vinay P. Namboodiri C. V. Jawahar CVBM 16 1 0 17 Aug 2022
Neural Dubber: Dubbing for Videos According to Scripts Chenxu Hu Qiao Tian Tingle Li Yuping Wang Yuxuan Wang Hang Zhao DiffM VGen 36 39 0 15 Oct 2021
Self-Supervised 3D Face Reconstruction via Conditional Estimation Yandong Wen Weiyang Liu Bhiksha Raj Rita Singh CVBM 28 21 0 10 Oct 2021
FaVoA: Face-Voice Association Favours Ambiguous Speaker Detection Hugo C. C. Carneiro C. Weber S. Wermter CVBM 31 7 0 01 Sep 2021
A cappella: Audio-visual Singing Voice Separation Juan F. Montesinos V. S. Kadandale G. Haro 38 16 0 20 Apr 2021
MAAS: Multi-modal Assignation for Active Speaker Detection Juan Carlos León Alcázar Fabian Caba Heilbron Ali K. Thabet Guohao Li 62 51 0 11 Jan 2021
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos Andrew Rouditchenko Angie Boggust David Harwath Brian Chen D. Joshi ... Rogerio Feris Brian Kingsbury M. Picheny Antonio Torralba James R. Glass SSL 22 141 0 16 Jun 2020
Active Speakers in Context Juan Carlos León Alcázar Fabian Caba Heilbron Long Mai Federico Perazzi Joon-Young Lee Pablo Arbelaez Guohao Li 19 61 0 20 May 2020
Multimodal Target Speech Separation with Voice and Face References Leyuan Qu C. Weber S. Wermter CVBM 19 19 0 17 May 2020
FaceFilter: Audio-visual speech separation using still images Soo-Whan Chung Soyeon Choe Joon Son Chung Hong-Goo Kang CVBM 21 66 0 14 May 2020
Cross-modal Speaker Verification and Recognition: A Multilingual Perspective M. S. Saeed Shah Nawaz Pietro Morerio Arif Mahmood I. Gallo Muhammad Haroon Yousaf Alessio Del Bue CVBM 26 25 0 28 Apr 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision Arsha Nagrani Joon Son Chung Samuel Albanie Andrew Zisserman SSL 21 88 0 20 Feb 2020
Learning to Localize Sound Sources in Visual Scenes: Analysis and Applications Arda Senocak Tae-Hyun Oh Junsik Kim Ming-Hsuan Yang In So Kweon SSL 30 52 0 20 Nov 2019
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild Samuel Albanie Arsha Nagrani Andrea Vedaldi Andrew Zisserman CVBM 27 270 0 16 Aug 2018
Disjoint Mapping Network for Cross-modal Matching of Voices and Faces Yandong Wen Mahmoud Al Ismail Weiyang Liu Bhiksha Raj Rita Singh FedML 22 70 0 12 Jul 2018
Lip Reading Sentences in the Wild Joon Son Chung A. Senior Oriol Vinyals Andrew Zisserman 164 784 0 16 Nov 2016