Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1805.00833
Cited By
Learnable PINs: Cross-Modal Embeddings for Person Identity
2 May 2018
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learnable PINs: Cross-Modal Embeddings for Person Identity"
31 / 31 papers shown
Title
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
32
4
0
21 Jul 2024
Audio-Visual Speaker Verification via Joint Cross-Attention
R Gnana Praveen
Jahangir Alam
26
6
0
28 Sep 2023
Speaker Recognition in Realistic Scenario Using Multimodal Data
Saqlain Hussain Shah
M. S. Saeed
Shah Nawaz
Muhammad Haroon Yousaf
CVBM
21
8
0
25 Feb 2023
Audio Representation Learning by Distilling Video as Privileged Information
Amirhossein Hajavi
Ali Etemad
13
4
0
06 Feb 2023
A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition
Vijay John
Yasutomo Kawanishi
27
7
0
20 Oct 2022
Robust Sound-Guided Image Manipulation
Seung Hyun Lee
Gyeongrok Oh
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
DiffM
26
7
0
30 Aug 2022
Learning Branched Fusion and Orthogonal Projection for Face-Voice Association
M. S. Saeed
Shah Nawaz
M. H. Khan
S. Javed
Muhammad Haroon Yousaf
Alessio Del Bue
CVBM
21
4
0
22 Aug 2022
Is an Object-Centric Video Representation Beneficial for Transfer?
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
ViT
31
26
0
20 Jul 2022
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
35
106
0
02 Mar 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
A. Haliassos
Rodrigo Mira
Stavros Petridis
M. Pantic
CVBM
27
125
0
18 Jan 2022
Cross Modal Retrieval with Querybank Normalisation
Simion-Vlad Bogolin
Ioana Croitoru
Hailin Jin
Yang Liu
Samuel Albanie
27
84
0
23 Dec 2021
Sound-Guided Semantic Image Manipulation
Seung Hyun Lee
Wonseok Roh
Wonmin Byeon
Sang Ho Yoon
Chanyoung Kim
Jinkyu Kim
Sangpil Kim
DiffM
21
43
0
30 Nov 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation
Tanzila Rahman
Mengyu Yang
Leonid Sigal
ViT
21
8
0
26 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
36
39
0
15 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSL
AI4TS
27
0
0
13 Oct 2021
FaVoA: Face-Voice Association Favours Ambiguous Speaker Detection
Hugo C. C. Carneiro
C. Weber
S. Wermter
CVBM
23
7
0
01 Sep 2021
ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer
Boseung Jeong
Jicheol Park
Suha Kwak
60
21
0
10 Aug 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Francisco Rivera Valverde
Juana Valeria Hurtado
Abhinav Valada
26
72
0
01 Mar 2021
MAAS: Multi-modal Assignation for Active Speaker Detection
Juan Carlos León Alcázar
Fabian Caba Heilbron
Ali K. Thabet
Bernard Ghanem
59
51
0
11 Jan 2021
Multimodal Target Speech Separation with Voice and Face References
Leyuan Qu
C. Weber
S. Wermter
CVBM
19
19
0
17 May 2020
FaceFilter: Audio-visual speech separation using still images
Soo-Whan Chung
Soyeon Choe
Joon Son Chung
Hong-Goo Kang
CVBM
21
66
0
14 May 2020
Cross-modal Speaker Verification and Recognition: A Multilingual Perspective
M. S. Saeed
Shah Nawaz
Pietro Morerio
Arif Mahmood
I. Gallo
Muhammad Haroon Yousaf
Alessio Del Bue
CVBM
24
25
0
28 Apr 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision
Arsha Nagrani
Joon Son Chung
Samuel Albanie
Andrew Zisserman
SSL
16
88
0
20 Feb 2020
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
R. He
31
156
0
14 Jan 2020
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
16
332
0
22 Aug 2019
Video Face Clustering with Unknown Number of Clusters
Makarand Tapaswi
M. Law
Sanja Fidler
CVBM
19
60
0
09 Aug 2019
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
28
387
0
31 Jul 2019
Self-supervised learning of a facial attribute embedding from video
Olivia Wiles
A. Sophia Koepke
Andrew Zisserman
CVBM
SSL
18
132
0
21 Aug 2018
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Samuel Albanie
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
CVBM
22
270
0
16 Aug 2018
Disjoint Mapping Network for Cross-modal Matching of Voices and Faces
Yandong Wen
Mahmoud Al Ismail
Weiyang Liu
Bhiksha Raj
Rita Singh
FedML
16
70
0
12 Jul 2018
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
219
2,233
0
14 Jun 2018
1