ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.02090
  4. Cited By
VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices
v1v2 (latest)

VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices

Interspeech (Interspeech), 2022
5 April 2022
V. S. Kadandale
Juan F. Montesinos
G. Haro
ArXiv (abs)PDFHTML

Papers citing "VocaLiST: An Audio-Visual Synchronisation Model for Lips and Voices"

11 / 11 papers shown
SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation
SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation
Zeyu Ling
Xiaodong Gu
Jiangnan Tang
Changqing Zou
CLIP
190
0
0
11 Oct 2025
Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation
Mask-Free Audio-driven Talking Face Generation for Enhanced Visual Quality and Identity Preservation
Dogucan Yaman
Fevziye Irem Eyiokur
Leonard Barmann
H. K. Ekenel
Alexander H. Waibel
CVBM
246
1
0
28 Jul 2025
SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis
SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley SynthesisIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Marco Comunità
R. F. Gramaccioni
Emilian Postolache
Emanuele Rodolà
Danilo Comminiello
Joshua D. Reiss
DiffM
291
31
0
23 Oct 2023
GestSync: Determining who is speaking without a talking head
GestSync: Determining who is speaking without a talking headBritish Machine Vision Conference (BMVC), 2023
Sindhu B. Hegde
Andrew Zisserman
210
2
0
08 Oct 2023
Speech inpainting: Context-based speech synthesis guided by video
Speech inpainting: Context-based speech synthesis guided by videoInterspeech (Interspeech), 2023
Juan F. Montesinos
Daniel Michelsanti
G. Haro
Zheng-Hua Tan
Jesper Jensen
319
6
0
01 Jun 2023
Laughing Matters: Introducing Laughing-Face Generation using Diffusion
  Models
Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models
Antoni Bigata Casademunt
Rodrigo Mira
Nikita Drobyshev
Konstantinos Vougioukas
Stavros Petridis
Maja Pantic
DiffM
288
2
0
15 May 2023
ModEFormer: Modality-Preserving Embedding for Audio-Video
  Synchronization using Transformers
ModEFormer: Modality-Preserving Embedding for Audio-Video Synchronization using TransformersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Akash Gupta
Rohun Tripathi
Won-Kap Jang
283
9
0
21 Mar 2023
Talking Head Generation with Probabilistic Audio-to-Visual Diffusion
  Priors
Talking Head Generation with Probabilistic Audio-to-Visual Diffusion PriorsIEEE International Conference on Computer Vision (ICCV), 2022
Zhentao Yu
Zixin Yin
Deyu Zhou
Duomin Wang
Finn Wong
Baoyuan Wang
DiffM
248
60
0
07 Dec 2022
Multimodal Transformer Distillation for Audio-Visual Synchronization
Multimodal Transformer Distillation for Audio-Visual SynchronizationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Xuan-Bo Chen
Haibin Wu
Chung-Che Wang
Hung-yi Lee
J. Jang
200
8
0
27 Oct 2022
Sparse in Space and Time: Audio-visual Synchronisation with Trainable
  Selectors
Sparse in Space and Time: Audio-visual Synchronisation with Trainable SelectorsBritish Machine Vision Conference (BMVC), 2022
Vladimir E. Iashin
Weidi Xie
Esa Rahtu
Andrew Zisserman
182
34
0
13 Oct 2022
Deep Learning for Visual Speech Analysis: A Survey
Deep Learning for Visual Speech Analysis: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Changchong Sheng
Gangyao Kuang
L. Bai
Chen Hou
Yike Guo
Xin Xu
M. Pietikäinen
Tianpeng Liu
VLM
373
56
0
22 May 2022
1
Page 1 of 1