Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2303.14717
Cited By

CelebV-Text: A Large-Scale Facial Text-Video Dataset

CelebV-Text: A Large-Scale Facial Text-Video Dataset

Computer Vision and Pattern Recognition (CVPR), 2023

26 March 2023

Jianhui Yu

Chen Change Loy

Weidong (Tom) Cai

ArXiv (abs)PDF HTML Github (4682★)

Papers citing "CelebV-Text: A Large-Scale Facial Text-Video Dataset"

50 / 61 papers shown

IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer

IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer

128

0

0

27 Nov 2025

MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices

MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices

253

0

0

26 Nov 2025

Back to the Feature: Explaining Video Classifiers with Video Counterfactual Explanations

Back to the Feature: Explaining Video Classifiers with Video Counterfactual Explanations

Luis C. Garcia-Peraza-Herrera

287

0

0

25 Nov 2025

VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models

VidEmo: Affective-Tree Reasoning for Emotion-Centric Video Foundation Models

159

2

0

04 Nov 2025

What If : Understanding Motion Through Sparse Interactions

What If : Understanding Motion Through Sparse Interactions

188

1

0

14 Oct 2025

SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation

SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation

196

0

0

11 Oct 2025

Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer

Durian: Dual Reference Image-Guided Portrait Animation with Attribute Transfer

178

0

0

04 Sep 2025

Human Motion Video Generation: A Survey

Human Motion Video Generation: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

...

269

33

0

04 Sep 2025

Towards High-Fidelity, Identity-Preserving Real-Time Makeup Transfer: Decoupling Style Generation

Towards High-Fidelity, Identity-Preserving Real-Time Makeup Transfer: Decoupling Style Generation

Lydia Kin Ching Chau

251

0

0

02 Sep 2025

MoSA: Motion-Coherent Human Video Generation via Structure-Appearance Decoupling

MoSA: Motion-Coherent Human Video Generation via Structure-Appearance Decoupling

206

0

0

24 Aug 2025

DiTalker: A Unified DiT-based Framework for High-Quality and Speaking Styles Controllable Portrait Animation

DiTalker: A Unified DiT-based Framework for High-Quality and Speaking Styles Controllable Portrait Animation

162

1

0

29 Jul 2025

MoDA: Multi-modal Diffusion Architecture for Talking Head Generation

MoDA: Multi-modal Diffusion Architecture for Talking Head Generation

314

0

0

04 Jul 2025

Sonic4D: Spatial Audio Generation for Immersive 4D Scene Exploration

Sonic4D: Spatial Audio Generation for Immersive 4D Scene Exploration

Zhibo Chen

397

4

0

18 Jun 2025

EchoShot: Multi-Shot Portrait Video Generation

EchoShot: Multi-Shot Portrait Video Generation

281

11

0

16 Jun 2025

Exploring Timeline Control for Facial Motion Generation

Exploring Timeline Control for Facial Motion GenerationComputer Vision and Pattern Recognition (CVPR), 2025

283

2

0

27 May 2025

KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution

KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution

Stella Bounareli

Michał Stypułkowski

Konstantinos Vougioukas

Stavros Petridis

423

8

0

01 May 2025

Learning Joint ID-Textual Representation for ID-Preserving Image Synthesis

Learning Joint ID-Textual Representation for ID-Preserving Image Synthesis

499

2

0

19 Apr 2025

Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Mavors: Multi-granularity Video Representation for Multimodal Large Language ModelACM Multimedia (ACM MM), 2025

...

469

17

0

14 Apr 2025

FVQ: A Large-Scale Dataset and an LMM-based Method for Face Video Quality Assessment

FVQ: A Large-Scale Dataset and an LMM-based Method for Face Video Quality Assessment

853

4

0

12 Apr 2025

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models

VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models

337

8

0

03 Apr 2025

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation

608

14

0

03 Apr 2025

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation

HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video GenerationComputer Vision and Pattern Recognition (CVPR), 2025

343

15

0

31 Mar 2025

MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation

MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait AnimationComputer Vision and Pattern Recognition (CVPR), 2025

349

14

0

25 Mar 2025

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

421

34

0

20 Mar 2025

Visual Persona: Foundation Model for Full-Body Human Customization

Visual Persona: Foundation Model for Full-Body Human CustomizationComputer Vision and Pattern Recognition (CVPR), 2025

384

8

0

19 Mar 2025

Personalized Generation In Large Model Era: A Survey

Personalized Generation In Large Model Era: A SurveyAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

639

42

0

04 Mar 2025

KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation

KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame InterpolationComputer Vision and Pattern Recognition (CVPR), 2025

Michał Stypułkowski

Stella Bounareli

Konstantinos Vougioukas

Nikita Drobyshev

Stavros Petridis

449

10

0

03 Mar 2025

FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion model

FLAP: Fully-controllable Audio-driven Portrait Video Generation through 3D head conditioned diffusion model

Ruonan Zhang

Guiming Mo

Jiawei Jin

Kai Zhang

Haozhi Huang

610

0

0

26 Feb 2025

PERSE: Personalized 3D Generative Avatars from A Single Portrait

PERSE: Personalized 3D Generative Avatars from A Single PortraitComputer Vision and Pattern Recognition (CVPR), 2024

299

14

0

30 Dec 2024

Omni-ID: Holistic Identity Representation Designed for Generative Tasks

Omni-ID: Holistic Identity Representation Designed for Generative TasksComputer Vision and Pattern Recognition (CVPR), 2024

Kuan-Chieh Wang

Daniil Ostashev

Sergey Tulyakov

Daniel Cohen-Or

483

19

0

12 Dec 2024

HiFiVFS: High Fidelity Video Face Swapping

HiFiVFS: High Fidelity Video Face Swapping

427

8

0

27 Nov 2024

MotionCharacter: Fine-Grained Motion Controllable Human Video Generation

MotionCharacter: Fine-Grained Motion Controllable Human Video Generation

287

12

0

27 Nov 2024

Sonic: Shifting Focus to Global Audio Perception in Portrait Animation

Sonic: Shifting Focus to Global Audio Perception in Portrait AnimationComputer Vision and Pattern Recognition (CVPR), 2024

...

463

59

0

25 Nov 2024

HumanVLM: Foundation for Human-Scene Vision-Language Model

HumanVLM: Foundation for Human-Scene Vision-Language ModelInformation Fusion (Inf. Fusion), 2024

431

15

0

05 Nov 2024

Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions

Joker: Conditional 3D Head Synthesis with Extreme Facial ExpressionsInternational Conference on 3D Vision (3DV), 2024

248

13

0

21 Oct 2024

MMHead: Towards Fine-grained Multi-modal 3D Facial Animation

MMHead: Towards Fine-grained Multi-modal 3D Facial AnimationACM Multimedia (MM), 2024

Yunhao Li

Guangtao Zhai

291

23

0

10 Oct 2024

Face Forgery Detection with Elaborate Backbone

Face Forgery Detection with Elaborate Backbone

Jie Zhang

Shiguang Shan

342

2

0

25 Sep 2024

DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation

DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation

478

2

0

23 Sep 2024

InstantDrag: Improving Interactivity in Drag-based Image Editing

InstantDrag: Improving Interactivity in Drag-based Image EditingACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2024

Jaesik Park

323

34

0

13 Sep 2024

What to Preserve and What to Transfer: Faithful, Identity-Preserving
Diffusion-based Hairstyle Transfer

What to Preserve and What to Transfer: Faithful, Identity-Preserving Diffusion-based Hairstyle TransferAAAI Conference on Artificial Intelligence (AAAI), 2024

Sunghyun Park

187

6

0

29 Aug 2024

15M Multimodal Facial Image-Text Dataset

15M Multimodal Facial Image-Text Dataset

Zhang YuanHui

Guoyin Wang

471

20

0

11 Jul 2024

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation

Zhenheng Yang

Zhijie Chen

Jian Yang

Ying Tai

763

234

0

02 Jul 2024

MultiTalk: Enhancing 3D Talking Head Generation Across Languages with
Multilingual Video Dataset

MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset

278

23

0

20 Jun 2024

V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data

V-LASIK: Consistent Glasses-Removal from Videos Using Synthetic Data

Rotem Shalev-Arkushin

Eitan Richardson

Amit H. Bermano

372

0

0

20 Jun 2024

From Sora What We Can See: A Survey of Text-to-Video Generation

From Sora What We Can See: A Survey of Text-to-Video Generation

301

43

0

17 May 2024

ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

ID-Animator: Zero-Shot Identity-Preserving Human Video Generation

Tao Hu

Ke Cao

534

104

0

23 Apr 2024

3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow

3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow

216

12

0

15 Apr 2024

Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer

Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer

325

56

0

20 Mar 2024

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video
Diffusion Models

VidProM: A Million-scale Real Prompt-Gallery Dataset for Text-to-Video Diffusion ModelsNeural Information Processing Systems (NeurIPS), 2024

Yi Yang

540

91

0

10 Mar 2024

Detecting Multimedia Generated by Large AI Models: A Survey

Detecting Multimedia Generated by Large AI Models: A Survey

Luisa Verdoliva

1.1K

101

0

22 Jan 2024

Page 1 of 2