ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1805.00833
  4. Cited By
Learnable PINs: Cross-Modal Embeddings for Person Identity

Learnable PINs: Cross-Modal Embeddings for Person Identity

2 May 2018
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
    SSL
ArXivPDFHTML

Papers citing "Learnable PINs: Cross-Modal Embeddings for Person Identity"

31 / 31 papers shown
Title
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
32
4
0
21 Jul 2024
Audio-Visual Speaker Verification via Joint Cross-Attention
Audio-Visual Speaker Verification via Joint Cross-Attention
R Gnana Praveen
Jahangir Alam
26
6
0
28 Sep 2023
Speaker Recognition in Realistic Scenario Using Multimodal Data
Speaker Recognition in Realistic Scenario Using Multimodal Data
Saqlain Hussain Shah
M. S. Saeed
Shah Nawaz
Muhammad Haroon Yousaf
CVBM
21
8
0
25 Feb 2023
Audio Representation Learning by Distilling Video as Privileged
  Information
Audio Representation Learning by Distilling Video as Privileged Information
Amirhossein Hajavi
Ali Etemad
13
4
0
06 Feb 2023
A Multimodal Sensor Fusion Framework Robust to Missing Modalities for
  Person Recognition
A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition
Vijay John
Yasutomo Kawanishi
27
7
0
20 Oct 2022
Robust Sound-Guided Image Manipulation
Robust Sound-Guided Image Manipulation
Seung Hyun Lee
Gyeongrok Oh
Wonmin Byeon
Sang Ho Yoon
Jinkyu Kim
Sangpil Kim
DiffM
26
7
0
30 Aug 2022
Learning Branched Fusion and Orthogonal Projection for Face-Voice
  Association
Learning Branched Fusion and Orthogonal Projection for Face-Voice Association
M. S. Saeed
Shah Nawaz
M. H. Khan
S. Javed
Muhammad Haroon Yousaf
Alessio Del Bue
CVBM
21
4
0
22 Aug 2022
Is an Object-Centric Video Representation Beneficial for Transfer?
Is an Object-Centric Video Representation Beneficial for Transfer?
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
ViT
31
26
0
20 Jul 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
35
106
0
02 Mar 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery
  Detection
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
A. Haliassos
Rodrigo Mira
Stavros Petridis
M. Pantic
CVBM
27
125
0
18 Jan 2022
Cross Modal Retrieval with Querybank Normalisation
Cross Modal Retrieval with Querybank Normalisation
Simion-Vlad Bogolin
Ioana Croitoru
Hailin Jin
Yang Liu
Samuel Albanie
27
84
0
23 Dec 2021
Sound-Guided Semantic Image Manipulation
Sound-Guided Semantic Image Manipulation
Seung Hyun Lee
Wonseok Roh
Wonmin Byeon
Sang Ho Yoon
Chanyoung Kim
Jinkyu Kim
Sangpil Kim
DiffM
21
43
0
30 Nov 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning
  for Visual Sound Separation
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation
Tanzila Rahman
Mengyu Yang
Leonid Sigal
ViT
21
8
0
26 Oct 2021
Neural Dubber: Dubbing for Videos According to Scripts
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
36
39
0
15 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised
  Audiovisual Representation Learning
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSL
AI4TS
27
0
0
13 Oct 2021
FaVoA: Face-Voice Association Favours Ambiguous Speaker Detection
FaVoA: Face-Voice Association Favours Ambiguous Speaker Detection
Hugo C. C. Carneiro
C. Weber
S. Wermter
CVBM
23
7
0
01 Sep 2021
ASMR: Learning Attribute-Based Person Search with Adaptive Semantic
  Margin Regularizer
ASMR: Learning Attribute-Based Person Search with Adaptive Semantic Margin Regularizer
Boseung Jeong
Jicheol Park
Suha Kwak
60
21
0
10 Aug 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection
  and Tracking with Sound by Distilling Multimodal Knowledge
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Francisco Rivera Valverde
Juana Valeria Hurtado
Abhinav Valada
26
72
0
01 Mar 2021
MAAS: Multi-modal Assignation for Active Speaker Detection
MAAS: Multi-modal Assignation for Active Speaker Detection
Juan Carlos León Alcázar
Fabian Caba Heilbron
Ali K. Thabet
Bernard Ghanem
59
51
0
11 Jan 2021
Multimodal Target Speech Separation with Voice and Face References
Multimodal Target Speech Separation with Voice and Face References
Leyuan Qu
C. Weber
S. Wermter
CVBM
19
19
0
17 May 2020
FaceFilter: Audio-visual speech separation using still images
FaceFilter: Audio-visual speech separation using still images
Soo-Whan Chung
Soyeon Choe
Joon Son Chung
Hong-Goo Kang
CVBM
21
66
0
14 May 2020
Cross-modal Speaker Verification and Recognition: A Multilingual
  Perspective
Cross-modal Speaker Verification and Recognition: A Multilingual Perspective
M. S. Saeed
Shah Nawaz
Pietro Morerio
Arif Mahmood
I. Gallo
Muhammad Haroon Yousaf
Alessio Del Bue
CVBM
24
25
0
28 Apr 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision
Disentangled Speech Embeddings using Cross-modal Self-supervision
Arsha Nagrani
Joon Son Chung
Samuel Albanie
Andrew Zisserman
SSL
16
88
0
20 Feb 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
R. He
31
156
0
14 Jan 2020
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action
  Recognition
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
EgoV
16
332
0
22 Aug 2019
Video Face Clustering with Unknown Number of Clusters
Video Face Clustering with Unknown Number of Clusters
Makarand Tapaswi
M. Law
Sanja Fidler
CVBM
19
60
0
09 Aug 2019
Use What You Have: Video Retrieval Using Representations From
  Collaborative Experts
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
Yang Liu
Samuel Albanie
Arsha Nagrani
Andrew Zisserman
28
387
0
31 Jul 2019
Self-supervised learning of a facial attribute embedding from video
Self-supervised learning of a facial attribute embedding from video
Olivia Wiles
A. Sophia Koepke
Andrew Zisserman
CVBM
SSL
18
132
0
21 Aug 2018
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Samuel Albanie
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
CVBM
22
270
0
16 Aug 2018
Disjoint Mapping Network for Cross-modal Matching of Voices and Faces
Disjoint Mapping Network for Cross-modal Matching of Voices and Faces
Yandong Wen
Mahmoud Al Ismail
Weiyang Liu
Bhiksha Raj
Rita Singh
FedML
16
70
0
12 Jul 2018
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
219
2,233
0
14 Jun 2018
1