Learnable PINs: Cross-Modal Embeddings for Person Identity

2 May 2018

Papers citing "Learnable PINs: Cross-Modal Embeddings for Person Identity"

31 / 31 papers shown

Title
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning Shuai Wang Zheng-Shou Chen Kong Aik Lee Yan-min Qian Haizhou Li 32 4 0 21 Jul 2024
Audio-Visual Speaker Verification via Joint Cross-Attention R Gnana Praveen Jahangir Alam 26 6 0 28 Sep 2023
Speaker Recognition in Realistic Scenario Using Multimodal Data Saqlain Hussain Shah M. S. Saeed Shah Nawaz Muhammad Haroon Yousaf CVBM 21 8 0 25 Feb 2023
Audio Representation Learning by Distilling Video as Privileged Information Amirhossein Hajavi Ali Etemad 13 4 0 06 Feb 2023
A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition Vijay John Yasutomo Kawanishi 27 7 0 20 Oct 2022
Robust Sound-Guided Image Manipulation Seung Hyun Lee Gyeongrok Oh Wonmin Byeon Sang Ho Yoon Jinkyu Kim Sangpil Kim DiffM 26 7 0 30 Aug 2022
Learning Branched Fusion and Orthogonal Projection for Face-Voice Association M. S. Saeed Shah Nawaz M. H. Khan S. Javed Muhammad Haroon Yousaf Alessio Del Bue CVBM 21 4 0 22 Aug 2022
Is an Object-Centric Video Representation Beneficial for Transfer? Chuhan Zhang Ankush Gupta Andrew Zisserman ViT 31 26 0 20 Jul 2022
Audio Self-supervised Learning: A Survey Shuo Liu Adria Mallol-Ragolta Emilia Parada-Cabeleiro Kun Qian Xingshuo Jing Alexander Kathan Bin Hu Bjoern W. Schuller SSL 35 106 0 02 Mar 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection A. Haliassos Rodrigo Mira Stavros Petridis M. Pantic CVBM 27 125 0 18 Jan 2022
Cross Modal Retrieval with Querybank Normalisation Simion-Vlad Bogolin Ioana Croitoru Hailin Jin Yang Liu Samuel Albanie 27 84 0 23 Dec 2021
Sound-Guided Semantic Image Manipulation Seung Hyun Lee Wonseok Roh Wonmin Byeon Sang Ho Yoon Chanyoung Kim Jinkyu Kim Sangpil Kim DiffM 21 43 0 30 Nov 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation Tanzila Rahman Mengyu Yang Leonid Sigal ViT 21 8 0 26 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts Chenxu Hu Qiao Tian Tingle Li Yuping Wang Yuxuan Wang Hang Zhao DiffM VGen 36 39 0 15 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning Haider Al-Tahan Y. Mohsenzadeh SSL AI4TS 27 0 0 13 Oct 2021
FaVoA: Face-Voice Association Favours Ambiguous Speaker Detection Hugo C. C. Carneiro C. Weber S. Wermter CVBM 23 7 0 01 Sep 2021
ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer Boseung Jeong Jicheol Park Suha Kwak 60 21 0 10 Aug 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge Francisco Rivera Valverde Juana Valeria Hurtado Abhinav Valada 26 72 0 01 Mar 2021
MAAS: Multi-modal Assignation for Active Speaker Detection Juan Carlos León Alcázar Fabian Caba Heilbron Ali K. Thabet Bernard Ghanem 59 51 0 11 Jan 2021
Multimodal Target Speech Separation with Voice and Face References Leyuan Qu C. Weber S. Wermter CVBM 19 19 0 17 May 2020
FaceFilter: Audio-visual speech separation using still images Soo-Whan Chung Soyeon Choe Joon Son Chung Hong-Goo Kang CVBM 21 66 0 14 May 2020
Cross-modal Speaker Verification and Recognition: A Multilingual Perspective M. S. Saeed Shah Nawaz Pietro Morerio Arif Mahmood I. Gallo Muhammad Haroon Yousaf Alessio Del Bue CVBM 24 25 0 28 Apr 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision Arsha Nagrani Joon Son Chung Samuel Albanie Andrew Zisserman SSL 16 88 0 20 Feb 2020
Deep Audio-Visual Learning: A Survey Hao Zhu Mandi Luo Rui Wang A. Zheng R. He 31 156 0 14 Jan 2020
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition Evangelos Kazakos Arsha Nagrani Andrew Zisserman Dima Damen EgoV 16 332 0 22 Aug 2019
Video Face Clustering with Unknown Number of Clusters Makarand Tapaswi M. Law Sanja Fidler CVBM 19 60 0 09 Aug 2019
Use What You Have: Video Retrieval Using Representations From Collaborative Experts Yang Liu Samuel Albanie Arsha Nagrani Andrew Zisserman 28 387 0 31 Jul 2019
Self-supervised learning of a facial attribute embedding from video Olivia Wiles A. Sophia Koepke Andrew Zisserman CVBM SSL 18 132 0 21 Aug 2018
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild Samuel Albanie Arsha Nagrani Andrea Vedaldi Andrew Zisserman CVBM 22 270 0 16 Aug 2018
Disjoint Mapping Network for Cross-modal Matching of Voices and Faces Yandong Wen Mahmoud Al Ismail Weiyang Liu Bhiksha Raj Rita Singh FedML 16 70 0 12 Jul 2018
VoxCeleb2: Deep Speaker Recognition Joon Son Chung Arsha Nagrani Andrew Zisserman 219 2,233 0 14 Jun 2018