ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1809.08001
  4. Cited By
Perfect match: Improved cross-modal embeddings for audio-visual
  synchronisation
v1v2 (latest)

Perfect match: Improved cross-modal embeddings for audio-visual synchronisation

21 September 2018
Soo-Whan Chung
Joon Son Chung
Hong-Goo Kang
ArXiv (abs)PDFHTML

Papers citing "Perfect match: Improved cross-modal embeddings for audio-visual synchronisation"

28 / 78 papers shown
Automatic audiovisual synchronisation for ultrasound tongue imaging
Automatic audiovisual synchronisation for ultrasound tongue imagingSpeech Communication (Speech Commun.), 2021
Aciel Eshky
J. Cleland
M. Ribeiro
Eleanor Sugden
Korin Richmond
Steve Renals
79
7
0
31 May 2021
Divide and Contrast: Self-supervised Learning from Uncurated Data
Divide and Contrast: Self-supervised Learning from Uncurated DataIEEE International Conference on Computer Vision (ICCV), 2021
Yonglong Tian
Olivier J. Hénaff
Aaron van den Oord
SSL
335
110
0
17 May 2021
Representation Learning via Global Temporal Alignment and
  Cycle-Consistency
Representation Learning via Global Temporal Alignment and Cycle-ConsistencyComputer Vision and Pattern Recognition (CVPR), 2021
Isma Hadji
Konstantinos G. Derpanis
Allan D. Jepson
AI4TS
300
61
0
11 May 2021
Self-supervised object detection from audio-visual correspondence
Self-supervised object detection from audio-visual correspondenceComputer Vision and Pattern Recognition (CVPR), 2021
Triantafyllos Afouras
Yuki M. Asano
Francois Fagan
Andrea Vedaldi
Florian Metze
SSL
326
53
0
13 Apr 2021
Contrastive Learning of Global-Local Video Representations
Contrastive Learning of Global-Local Video Representations
Shuang Ma
Zhaoyang Zeng
Daniel J. McDuff
Yale Song
SSL
203
8
0
07 Apr 2021
Composable Augmentation Encoding for Video Representation Learning
Composable Augmentation Encoding for Video Representation LearningIEEE International Conference on Computer Vision (ICCV), 2021
Chen Sun
Arsha Nagrani
Yonglong Tian
Cordelia Schmid
SSLAI4TS
254
19
0
01 Apr 2021
Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual
  Speech Separation
Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech SeparationComputer Vision and Pattern Recognition (CVPR), 2021
Jiyoung Lee
Soo-Whan Chung
Sunok Kim
Hong-Goo Kang
Kwanghoon Sohn
178
59
0
25 Mar 2021
Automated Video Labelling: Identifying Faces by Corroborative Evidence
Automated Video Labelling: Identifying Faces by Corroborative EvidenceConference on Multimedia Information Processing and Retrieval (MIPR), 2021
Andrew Brown
Ernesto Coto
Andrew Zisserman
CVBM
108
15
0
10 Feb 2021
Cross-Modal Contrastive Learning for Text-to-Image Generation
Cross-Modal Contrastive Learning for Text-to-Image GenerationComputer Vision and Pattern Recognition (CVPR), 2021
Han Zhang
Jing Yu Koh
Jason Baldridge
Honglak Lee
Yinfei Yang
GAN
583
423
0
12 Jan 2021
MAAS: Multi-modal Assignation for Active Speaker Detection
MAAS: Multi-modal Assignation for Active Speaker DetectionIEEE International Conference on Computer Vision (ICCV), 2021
Juan Carlos León Alcázar
Fabian Caba Heilbron
Ali K. Thabet
Guohao Li
354
63
0
11 Jan 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
VisualVoice: Audio-Visual Speech Separation with Cross-Modal ConsistencyComputer Vision and Pattern Recognition (CVPR), 2021
Ruohan Gao
Kristen Grauman
CVBM
453
239
0
08 Jan 2021
Playing a Part: Speaker Verification at the Movies
Playing a Part: Speaker Verification at the MoviesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
A. Brown
Jaesung Huh
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
221
26
0
29 Oct 2020
Themes Informed Audio-visual Correspondence Learning
Themes Informed Audio-visual Correspondence Learning
Runze Su
Fei Tao
Xudong Liu
Haoran Wei
Xiaorong Mei
Z. Duan
Lei Yuan
Ji Liu
Yuying Xie
198
6
0
14 Sep 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised
  Audio-Visual Representation Learning
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning
Ying Cheng
Ruize Wang
Zhihao Pan
Rui Feng
Yuejie Zhang
SSL
295
118
0
13 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video
Self-Supervised Learning of Audio-Visual Objects from VideoEuropean Conference on Computer Vision (ECCV), 2020
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
243
278
0
10 Aug 2020
Spot the conversation: speaker diarisation in the wild
Spot the conversation: speaker diarisation in the wild
Joon Son Chung
Jaesung Huh
Arsha Nagrani
Triantafyllos Afouras
Andrew Zisserman
VGen
296
180
0
02 Jul 2020
Modality Dropout for Improved Performance-driven Talking Faces
Modality Dropout for Improved Performance-driven Talking FacesInternational Conference on Multimodal Interaction (ICMI), 2020
Ahmed Hussen Abdelaziz
B. Theobald
Paul Dixon
Reinhard Knothe
N. Apostoloff
Sachin Kajareker
211
44
0
27 May 2020
What Makes for Good Views for Contrastive Learning?
What Makes for Good Views for Contrastive Learning?
Yonglong Tian
Chen Sun
Ben Poole
Dilip Krishnan
Cordelia Schmid
Phillip Isola
SSL
434
1,495
0
20 May 2020
Active Speakers in Context
Active Speakers in Context
Juan Carlos León Alcázar
Fabian Caba Heilbron
Long Mai
Federico Perazzi
Joon-Young Lee
Pablo Arbelaez
Guohao Li
134
73
0
20 May 2020
End-to-End Lip Synchronisation Based on Pattern Classification
End-to-End Lip Synchronisation Based on Pattern Classification
You Jin Kim
Hee-Soo Heo
Soo-Whan Chung
Bong-Jin Lee
CVBM
164
0
0
18 May 2020
Disentangled Speech Embeddings using Cross-modal Self-supervision
Disentangled Speech Embeddings using Cross-modal Self-supervisionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Arsha Nagrani
Joon Son Chung
Samuel Albanie
Andrew Zisserman
SSL
189
95
0
20 Feb 2020
AlignNet: A Unifying Approach to Audio-Visual Alignment
AlignNet: A Unifying Approach to Audio-Visual AlignmentIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Jianren Wang
Zhaoyuan Fang
Hang Zhao
152
42
0
12 Feb 2020
Deep Audio-Visual Learning: A Survey
Deep Audio-Visual Learning: A SurveyInternational Journal of Automation and Computing (IJAC), 2020
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
223
178
0
14 Jan 2020
Detecting Adversarial Attacks On Audiovisual Speech Recognition
Detecting Adversarial Attacks On Audiovisual Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Pingchuan Ma
Stavros Petridis
Maja Pantic
AAML
171
22
0
18 Dec 2019
The sound of my voice: speaker representation loss for target voice
  separation
The sound of my voice: speaker representation loss for target voice separationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2019
Seongkyu Mun
Soyeon Choe
Jaesung Huh
Joon Son Chung
195
16
0
06 Nov 2019
Synchronising audio and ultrasound by learning cross-modal embeddings
Synchronising audio and ultrasound by learning cross-modal embeddingsInterspeech (Interspeech), 2019
Aciel Eshky
M. Ribeiro
Korin Richmond
Steve Renals
125
5
0
01 Jul 2019
Naver at ActivityNet Challenge 2019 -- Task B Active Speaker Detection
  (AVA)
Naver at ActivityNet Challenge 2019 -- Task B Active Speaker Detection (AVA)
Joon Son Chung
104
55
0
25 Jun 2019
Who said that?: Audio-visual speaker diarisation of real-world meetings
Who said that?: Audio-visual speaker diarisation of real-world meetingsInterspeech (Interspeech), 2019
Joon Son Chung
Bong-Jin Lee
Icksang Han
98
49
0
24 Jun 2019
Previous
12
Page 2 of 2