ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.11518
451
1
v1v2v3 (latest)

EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face Animation

21 August 2024
Yihong Lin
Liang Peng
Jianqiao Hu
Xiandong Li
Xiandong Li
Xiandong Li
Wenxiong Kang
Huang Xu
    CVBM3DH
ArXiv (abs)PDFHTML
Main:8 Pages
7 Figures
Bibliography:2 Pages
6 Tables
Appendix:4 Pages
Abstract

The creation of increasingly vivid 3D talking face has become a hot topic in recent years. Currently, most speech-driven works focus on lip synchronisation but neglect to effectively capture the correlations between emotions and facial motions. To address this problem, we propose a two-stream network called EmoFace, which consists of an emotion branch and a content branch. EmoFace employs a novel Mesh Attention mechanism to analyse and fuse the emotion features and content features. Particularly, a newly designed spatio-temporal graph-based convolution, SpiralConv3D, is used in Mesh Attention to learn potential temporal and spatial feature dependencies between mesh vertices. In addition, to the best of our knowledge, it is the first time to introduce a new self-growing training scheme with intermediate supervision to dynamically adjust the ratio of groundtruth adopted in the 3D face animation task. Comprehensive quantitative and qualitative evaluations on our high-quality 3D emotional facial animation dataset, 3D-RAVDESS (4.8863×10−54.8863\times 10^{-5}4.8863×10−5mm for LVE and 0.9509×10−50.9509\times 10^{-5}0.9509×10−5mm for EVE), together with the public dataset VOCASET (2.8669×10−52.8669\times 10^{-5}2.8669×10−5mm for LVE and 0.4664×10−50.4664\times 10^{-5}0.4664×10−5mm for EVE), demonstrate that our approach achieves state-of-the-art performance.

View on arXiv
Comments on this paper