Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1809.08001
Cited By
v1
v2 (latest)
Perfect match: Improved cross-modal embeddings for audio-visual synchronisation
21 September 2018
Soo-Whan Chung
Joon Son Chung
Hong-Goo Kang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Perfect match: Improved cross-modal embeddings for audio-visual synchronisation"
50 / 78 papers shown
Seeing What You Say: Expressive Image Generation from Speech
Jiyoung Lee
S. Park
Sanghyuk Chun
Soo-Whan Chung
DiffM
VGen
236
1
0
05 Nov 2025
Scaling Up Audio-Synchronized Visual Animation: An Efficient Training Paradigm
Lin Zhang
Zefan Cai
Jiuxiang Gu
Shentong Mo
Jinhong Lin
...
Ruiyi Zhang
Wen Xiao
Tong Sun
Junjie Hu
Pedro Morgado
VGen
171
1
0
05 Aug 2025
Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation
Dogucan Yaman
Fevziye Irem Eyiokur
Leonard Barmann
H. K. Ekenel
Alexander H. Waibel
CVBM
196
0
0
28 Jul 2025
ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization
Huilai Li
Yonghao Dang
Ying Xing
Yiming Wang
Jianqin Yin
192
0
0
14 Jul 2025
UniSync: A Unified Framework for Audio-Visual Synchronization
Tao Feng
Yifan Xie
Xun Guan
Jiyuan Song
Z. Liu
Fei Ma
Fei Richard Yu
305
4
0
20 Mar 2025
DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Shota Nakada
Taichi Nishimura
Hokuto Munakata
Masayoshi Kondo
Tatsuya Komatsu
CLIP
VLM
187
2
0
18 Sep 2024
Interpretable Convolutional SyncNet
Sungjoon Park
Jaesub Yun
Donggeon Lee
Minsik Park
291
1
0
02 Sep 2024
Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization
Luyao Cheng
Hui Wang
Siqi Zheng
Yafeng Chen
Rongjie Huang
Qinglin Zhang
Qian Chen
Xihao Li
220
5
0
22 Aug 2024
A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection
Kyungbok Lee
You Zhang
Zhiyao Duan
348
3
0
20 Jun 2024
Audio-Visual Talker Localization in Video for Spatial Sound Reproduction
Davide Berghi
Philip J. B. Jackson
222
1
0
01 Jun 2024
Audio-Synchronized Visual Animation
European Conference on Computer Vision (ECCV), 2024
Lin Zhang
Shentong Mo
Yijing Zhang
Pedro Morgado
DiffM
242
33
0
08 Mar 2024
Pretext Training Algorithms for Event Sequence Data
Yimu Wang
He Zhao
Ruizhi Deng
Frederick Tung
Greg Mori
AI4TS
158
0
0
16 Feb 2024
Synchformer: Efficient Synchronization from Sparse Cues
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
242
57
0
29 Jan 2024
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
International Journal of Computer Vision (IJCV), 2024
Zhi-Song Liu
Robin Courant
Vicky Kalogeiton
345
9
0
08 Jan 2024
GestSync: Determining who is speaking without a talking head
British Machine Vision Conference (BMVC), 2023
Sindhu B. Hegde
Andrew Zisserman
157
2
0
08 Oct 2023
Audio-driven Talking Face Generation with Stabilized Synchronization Loss
European Conference on Computer Vision (ECCV), 2023
Dogucan Yaman
Fevziye Irem Eyiokur
Leonard Barmann
H. K. Ekenel
Alexander Waibel
CVBM
414
11
0
18 Jul 2023
Backchannel Detection and Agreement Estimation from Video with Transformer Networks
IEEE International Joint Conference on Neural Network (IJCNN), 2023
A. Amer
Chirag Bhuvaneshwara
G. Addluri
Mohammed Maqsood Shaik
Vedant Bonde
Philippe Muller
225
9
0
02 Jun 2023
ModEFormer: Modality-Preserving Embedding for Audio-Video Synchronization using Transformers
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Akash Gupta
Rohun Tripathi
Won-Kap Jang
218
9
0
21 Mar 2023
WASD: A Wilder Active Speaker Detection Dataset
IEEE Transactions on Biometrics Behavior and Identity Science (TBBIS), 2023
Tiago Roxo
Joana Cabral Costa
Pedro R. M. Inácio
Hugo Manuel Proença
177
5
0
09 Mar 2023
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Computer Vision and Pattern Recognition (CVPR), 2023
Chao Feng
Ziyang Chen
Andrew Owens
272
112
0
04 Jan 2023
Jointly Learning Visual and Auditory Speech Representations from Raw Data
International Conference on Learning Representations (ICLR), 2022
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Maja Pantic
SSL
309
70
0
12 Dec 2022
Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors
IEEE International Conference on Computer Vision (ICCV), 2022
Zhentao Yu
Zixin Yin
Deyu Zhou
Duomin Wang
Finn Wong
Baoyuan Wang
DiffM
213
55
0
07 Dec 2022
SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory
AAAI Conference on Artificial Intelligence (AAAI), 2022
Se Jin Park
Minsu Kim
Joanna Hong
J. Choi
Y. Ro
CVBM
279
103
0
02 Nov 2022
Multimodal Transformer Distillation for Audio-Visual Synchronization
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Xuan-Bo Chen
Haibin Wu
Chung-Che Wang
Hung-yi Lee
J. Jang
155
6
0
27 Oct 2022
Towards Effective Image Manipulation Detection with Proposal Contrastive Learning
Yuyuan Zeng
Bowen Zhao
Shanzhao Qiu
Tao Dai
Shutao Xia
169
41
0
16 Oct 2022
Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors
British Machine Vision Conference (BMVC), 2022
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
149
32
0
13 Oct 2022
Learning State-Aware Visual Representations from Audible Interactions
Neural Information Processing Systems (NeurIPS), 2022
Himangi Mittal
Pedro Morgado
Unnat Jain
Abhinav Gupta
224
28
0
27 Sep 2022
Lip-to-Speech Synthesis for Arbitrary Speakers in the Wild
ACM Multimedia (ACM MM), 2022
Sindhu B. Hegde
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
224
16
0
01 Sep 2022
Extreme-scale Talking-Face Video Upsampling with Audio-Visual Priors
ACM Multimedia (ACM MM), 2022
Sindhu B. Hegde
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
CVBM
178
2
0
17 Aug 2022
End-To-End Audiovisual Feature Fusion for Active Speaker Detection
International Conference on Digital Image Processing (ICDIP), 2022
Fiseha B. Tesema
Zheyuan Lin
Shiqiang Zhu
Wei Song
J. Gu
Hong-Chuan Wu
159
4
0
27 Jul 2022
Deep Learning for Visual Speech Analysis: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Changchong Sheng
Gangyao Kuang
L. Bai
Chen Hou
Yike Guo
Xin Xu
M. Pietikäinen
Tianpeng Liu
VLM
321
53
0
22 May 2022
End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Otavio Braga
Takaki Makino
Olivier Siohan
H. Liao
CVBM
136
20
0
11 May 2022
A Closer Look at Audio-Visual Multi-Person Speech Recognition and Active Speaker Selection
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2021
Otavio Braga
Olivier Siohan
185
9
0
11 May 2022
Best of Both Worlds: Multi-task Audio-Visual Automatic Speech Recognition and Active Speaker Detection
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Otavio Braga
Olivier Siohan
CVBM
153
12
0
10 May 2022
VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
Interspeech (Interspeech), 2022
V. S. Kadandale
Juan F. Montesinos
G. Haro
230
30
0
05 Apr 2022
Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
IEEE International Conference on Computer Vision (ICCV), 2021
Minsu Kim
Joanna Hong
Se Jin Park
Yong Man Ro
CVBM
179
48
0
04 Apr 2022
Speaker Extraction with Co-Speech Gestures Cue
IEEE Signal Processing Letters (SPL), 2022
Zexu Pan
Xinyuan Qian
Haizhou Li
SLR
176
33
0
31 Mar 2022
End to End Lip Synchronization with a Temporal AutoEncoder
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2020
Yoav Shalev
Lior Wolf
84
9
0
30 Mar 2022
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition
International Conference on Information Photonics (ICIP), 2022
Zitian Zhang
Jie Zhang
Jian-Shu Zhang
Ming Wu
Xin Fang
Lirong Dai
SSL
270
12
0
15 Feb 2022
Data standardization for robust lip sync
IEEE International Conference on Multimedia and Expo (ICME), 2022
C. Wang
259
0
0
13 Feb 2022
Leveraging Real Talking Faces via Self-Supervision for Robust Forgery Detection
Computer Vision and Pattern Recognition (CVPR), 2022
A. Haliassos
Rodrigo Mira
Stavros Petridis
Maja Pantic
CVBM
385
173
0
18 Jan 2022
End-to-end speaker diarization with transformer
Yongquan Lai
Xin Tang
Yuanyuan Fu
Rui Fang
159
1
0
14 Dec 2021
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading
Leyuan Qu
C. Weber
S. Wermter
255
33
0
09 Dec 2021
Audio-Visual Synchronisation in the wild
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
200
49
0
08 Dec 2021
AVA-AVD: Audio-Visual Speaker Diarization in the Wild
ACM Multimedia (MM), 2021
Eric Z. Xu
Zeyang Song
Satoshi Tsutsui
C. Feng
Mang Ye
Mike Zheng Shou
VGen
426
54
0
29 Nov 2021
Structure from Silence: Learning Scene Structure from Ambient Sound
Conference on Robot Learning (CoRL), 2021
Ziyang Chen
Xixi Hu
Andrew Owens
178
31
0
10 Nov 2021
Look Who's Talking: Active Speaker Detection in the Wild
You Jin Kim
Hee-Soo Heo
Soyeon Choe
Soo-Whan Chung
Yoohwan Kwon
Bong-Jin Lee
Youngki Kwon
Joon Son Chung
209
27
0
17 Aug 2021
UniCon: Unified Context Network for Robust Active Speaker Detection
ACM Multimedia (ACM MM), 2021
Yuanhang Zhang
Susan Liang
Shuang Yang
Xiao-Chang Liu
Zhongqin Wu
Shiguang Shan
Xilin Chen
CVBM
154
43
0
05 Aug 2021
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection
ACM Multimedia (ACM MM), 2021
Ruijie Tao
Zexu Pan
Rohan Kumar Das
Xinyuan Qian
Mike Zheng Shou
Haizhou Li
208
218
0
14 Jul 2021
Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion
Interspeech (Interspeech), 2021
Baptiste Pouthier
L. Pilati
Leela K. Gudupudi
C. Bouveyron
F. Precioso
164
12
0
07 Jun 2021
1
2
Next
Page 1 of 2