ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.08801
  4. Cited By
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image
  Animation
v1v2 (latest)

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

13 June 2024
Mingwang Xu
Hui Li
Qingkun Su
Hanlin Shang
Liwei Zhang
Ce Liu
Jingdong Wang
Yao Yao
Siyu Zhu
    VGen
ArXiv (abs)PDFHTML

Papers citing "Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation"

50 / 99 papers shown
Title
EmoCAST: Emotional Talking Portrait via Emotive Text Description
EmoCAST: Emotional Talking Portrait via Emotive Text Description
Yiguo Jiang
Xiaodong Cun
Yong Zhang
Yudian Zheng
Fan Tang
Chi-Man Pun
DiffM
72
0
0
24 Dec 2025
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement
Zhizhou Zhong
Yicheng Ji
Zhe Kong
Y. Liu
Jiarui Wang
...
Ying Qin
Huan Li
Shuiyang Mao
W. Liu
Wenhan Luo
DiffMVGen
60
0
0
28 Nov 2025
IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer
IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer
Bo Chen
Tao Liu
Qi Chen
Xie Chen
Zilong Zheng
VGen
44
0
0
27 Nov 2025
ConsistTalk: Intensity Controllable Temporally Consistent Talking Head Generation with Diffusion Noise Search
ConsistTalk: Intensity Controllable Temporally Consistent Talking Head Generation with Diffusion Noise Search
Zhenjie Liu
Jianzhang Lu
Renjie Lu
Cong Liang
S. Wang
DiffMVGen
249
0
0
10 Nov 2025
See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region Refinement
See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region RefinementIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Jinting Wang
Jun Wang
Hei Victor Cheng
Li Liu
DiffM
104
0
0
28 Oct 2025
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
Junyoung Seo
Rodrigo Mira
A. Haliassos
Stella Bounareli
Honglie Chen
Linh Tran
Seungryong Kim
Zoe Landgraf
Jie Shen
VGen
121
1
0
27 Oct 2025
MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control
MAGIC-Talk: Motion-aware Audio-Driven Talking Face Generation with Customizable Identity Control
Fatemeh Nazarieh
Zhenhua Feng
Diptesh Kanojia
Muhammad Awais
J. Kittler
DiffMVGen
76
1
0
26 Oct 2025
Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback
Playmate2: Training-Free Multi-Character Audio-Driven Animation via Diffusion Transformer with Reward Feedback
Xingpei Ma
Shenneng Huang
Jiaran Cai
Yuansheng Guan
Shen Zheng
HanFeng Zhao
Qiang Zhang
Shunsi Zhang
VGen
137
3
0
14 Oct 2025
DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis
DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis
Peiyin Chen
Zhuowei Yang
Hui Feng
Sheng Jiang
Rui Yan
DiffMVGen
76
0
0
12 Oct 2025
VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework
VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework
Donglin Huang
Yongyuan Li
Tianhang Liu
Junming Huang
Xiaoda Yang
Chi-Yin Wang
Weiwei Xu
VGen
118
1
0
11 Oct 2025
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models
Cheng-Han Chiang
Xiaofei Wang
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Shujie Liu
Zhendong Wang
Zhengyuan Yang
Hung-yi Lee
Lijuan Wang
LLMAGReLMRALMLRM
164
3
0
08 Oct 2025
StableDub: Taming Diffusion Prior for Generalized and Efficient Visual Dubbing
StableDub: Taming Diffusion Prior for Generalized and Efficient Visual Dubbing
Liyang Chen
Tianze Zhou
Xu He
Boshi Tang
Zhiyong Wu
Yang Huang
Yang Wu
Zhongqian Sun
Wei Yang
Helen M. Meng
DiffM
157
0
0
26 Sep 2025
X-Streamer: Unified Human World Modeling with Audiovisual Interaction
X-Streamer: Unified Human World Modeling with Audiovisual Interaction
You Xie
Tianpei Gu
Zenan Li
Chenxu Zhang
Guoxian Song
Xiaochen Zhao
C. Liang
Jianwen Jiang
Hongyi Xu
Linjie Luo
VGen
157
2
0
25 Sep 2025
Talking Head Generation via AU-Guided Landmark Prediction
Talking Head Generation via AU-Guided Landmark Prediction
Shao-Yu Chang
Jingyi Xu
H. Le
Dimitris Samaras
136
1
0
24 Sep 2025
SynchroRaMa : Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding
SynchroRaMa : Lip-Synchronized and Emotion-Aware Talking Face Generation via Multi-Modal Emotion Embedding
Phyo Thet Yee
D. Kollias
Sudeepta Mishra
Abhinav Dhall
VGen
88
2
0
24 Sep 2025
DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces
DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces
Tianshuo Zhang
Li Gao
Siran Peng
Xiangyu Zhu
Zhen Lei
144
0
0
23 Sep 2025
Follow-Your-Emoji-Faster: Towards Efficient, Fine-Controllable, and Expressive Freestyle Portrait Animation
Follow-Your-Emoji-Faster: Towards Efficient, Fine-Controllable, and Expressive Freestyle Portrait Animation
Yue Ma
Zexuan Yan
Hongyu Liu
H. Wang
Heng Pan
...
H. Shum
Zhifeng Li
Wei Liu
Linfeng Zhang
Qifeng Chen
VGen
195
12
0
20 Sep 2025
AvatarSync: Rethinking Talking-Head Animation through Phoneme-Guided Autoregressive Perspective
AvatarSync: Rethinking Talking-Head Animation through Phoneme-Guided Autoregressive Perspective
Yuchen Deng
Xiuyang Wu
Hai-Tao Zheng
Suiyang Zhang
Yi He
Yuxing Han
VGen
80
0
0
15 Sep 2025
Human Motion Video Generation: A Survey
Human Motion Video Generation: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Haiwei Xue
Xiangyang Luo
Zhanghao Hu
Shu Zhang
Xunzhi Xiang
...
Fei Ma
Zhiyong Wu
Changpeng Yang
Zonghong Dai
Fei Richard Yu
EGVMVGen
221
23
0
04 Sep 2025
InfinityHuman: Towards Long-Term Audio-Driven Human
InfinityHuman: Towards Long-Term Audio-Driven Human
X. Li
Pan Xie
Yi Ren
Qijun Gan
Chen Zhang
Fangyuan Kong
Xiang Yin
Bingyue Peng
Zehuan Yuan
VGen
121
3
0
27 Aug 2025
OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation
OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation
Jianwen Jiang
Weihong Zeng
Zerong Zheng
Jiaqi Yang
Chao Liang
Wang Liao
Han Liang
Yuan Zhang
Mingyuan Gao
VGen
97
10
0
26 Aug 2025
Lightning Fast Caching-based Parallel Denoising Prediction for Accelerating Talking Head Generation
Lightning Fast Caching-based Parallel Denoising Prediction for Accelerating Talking Head Generation
Jianzhi Long
Wenhao Sun
Rongcheng Tu
Dacheng Tao
DiffMVGen
117
0
0
25 Aug 2025
TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis
TalkVid: A Large-Scale Diversified Dataset for Audio-Driven Talking Head Synthesis
Shunian Chen
Hejin Huang
Yexin Liu
Zihan Ye
Kai Chen
...
Junying Chen
Guanbin Li
Ser-Nam Lim
Harry Yang
Benyou Wang
EGVMVGen
84
2
0
19 Aug 2025
InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
InfiniteTalk: Audio-driven Video Generation for Sparse-Frame Video Dubbing
Shaoshu Yang
Zhe Kong
Feng Gao
Meng Cheng
Xiangyu Liu
...
Zhuoliang Kang
Tong Lu
Xunliang Cai
Ran He
Xiaoming Wei
VGen
99
7
0
19 Aug 2025
EDTalk++: Full Disentanglement for Controllable Talking Head Synthesis
EDTalk++: Full Disentanglement for Controllable Talking Head Synthesis
Shuai Tan
Bin Ji
154
0
0
19 Aug 2025
StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation
StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation
S. Tu
Yueming Pan
Y. Huang
Xintong Han
Zhen Xing
Jingdong Sun
Chong Luo
Zuxuan Wu
Yu-Gang Jiang
VGen
124
13
0
11 Aug 2025
RAP: Real-time Audio-driven Portrait Animation with Video Diffusion Transformer
RAP: Real-time Audio-driven Portrait Animation with Video Diffusion Transformer
Fangyu Du
Taiqing Li
Ziwei Zhang
Qian Qiao
Tan Yu
Dingcheng Zhen
Xu Jia
Yang Yang
Shunshun Yin
Siyuan Liu
VGen
92
2
0
07 Aug 2025
READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation
READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation
Haotian Wang
Yuzhe Weng
Jun Du
Haoran Xu
X. Wu
Shan He
Bing Yin
Cong Liu
J. Gao
Qingfeng Liu
DiffMVGen
204
1
0
05 Aug 2025
Text2Lip: Progressive Lip-Synced Talking Face Generation from Text via Viseme-Guided Rendering
Text2Lip: Progressive Lip-Synced Talking Face Generation from Text via Viseme-Guided Rendering
Xu Wang
Shengeng Tang
Fei Wang
L. T. Cheng
Dan Guo
Feng Xue
Richang Hong
90
1
0
04 Aug 2025
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent AttentionInternational Conference on Learning Representations (ICLR), 2025
Xiaochen Zhao
Hongyi Xu
Guoxian Song
You Xie
Chenxu Zhang
Xiu Li
Linjie Luo
J. Suo
Yebin Liu
VGen
144
16
0
30 Jul 2025
DiTalker: A Unified DiT-based Framework for High-Quality and Speaking Styles Controllable Portrait Animation
DiTalker: A Unified DiT-based Framework for High-Quality and Speaking Styles Controllable Portrait Animation
He Feng
Yongjia Ma
Donglin Di
Lei Fan
Tonghua Su
Xiangqian Wu
DiffMVGen
105
1
0
29 Jul 2025
JOLT3D: Joint Learning of Talking Heads and 3DMM Parameters with Application to Lip-Sync
JOLT3D: Joint Learning of Talking Heads and 3DMM Parameters with Application to Lip-Sync
Sungjoon Park
Minsik Park
Haneol Lee
Jaesub Yun
Donggeon Lee
3DH
129
0
0
28 Jul 2025
MagicAnime: A Hierarchically Annotated, Multimodal and Multitasking Dataset with Benchmarks for Cartoon Animation Generation
MagicAnime: A Hierarchically Annotated, Multimodal and Multitasking Dataset with Benchmarks for Cartoon Animation Generation
Shuolin Xu
Bingyuan Wang
Zeyu Cai
Fangteng Fu
Yue Ma
Tongyi Lee
Hongchuan Yu
Zeyu Wang
VGen
153
1
0
27 Jul 2025
Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation
Face2VoiceSync: Lightweight Face-Voice Consistency for Text-Driven Talking Face Generation
Fang Kang
Yin Cao
Haoyu Chen
188
1
0
25 Jul 2025
EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation
EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation
Rang Meng
Y. Wang
Weipeng Wu
Ruobing Zheng
Yuming Li
Chenguang Ma
VGen3DH
203
11
0
05 Jul 2025
MoDA: Multi-modal Diffusion Architecture for Talking Head Generation
MoDA: Multi-modal Diffusion Architecture for Talking Head Generation
Xinyang Li
Gen Li
Zhihui Lin
Yichen Qian
Gongxin Yao
Weinan Jia
Aowen Wang
Weihua Chen
Fan Wang
DiffMVGen
222
0
0
04 Jul 2025
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Gaojie Lin
Jianwen Jiang
Jiaqi Yang
Zerong Zheng
Chao Liang
DiffMVGen
1.2K
77
0
01 Jul 2025
iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer
iDiT-HOI: Inpainting-based Hand Object Interaction Reenactment via Video Diffusion Transformer
Zhelun Shen
Chenming Wu
Junsheng Zhou
Chen Zhao
Kaisiyuan Wang
Hang Zhou
Yingying Li
Haocheng Feng
Wei He
Jingdong Wang
DiffM
218
0
0
15 Jun 2025
LLIA -- Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models
LLIA -- Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models
Haojie Yu
Zhaonian Wang
Yihan Pan
Meng Cheng
Hao Yang
Chao Wang
Tao Xie
Xiaoming Xu
Xiaoming Wei
Xunliang Cai
VGen
220
2
0
06 Jun 2025
Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation
Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head GenerationComputer Vision and Pattern Recognition (CVPR), 2025
Yuan Gan
Jiaxu Miao
Yunze Wang
Yi Yang
AAMLDiffM
134
2
0
02 Jun 2025
Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization
Hallo4: High-Fidelity Dynamic Portrait Animation via Direct Preference Optimization
Jiahao Cui
Yan Chen
Mingwang Xu
Hanlin Shang
Yuxuan Chen
Yun Zhan
Zilong Dong
Yao Yao
Jingdong Wang
Siyu Zhu
DiffMVGen
472
8
0
29 May 2025
MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation
MMGT: Motion Mask Guided Two-Stage Network for Co-Speech Gesture Video Generation
Siyuan Wang
Jiawei Liu
Wei Wang
Yeying Jin
Jinsong Du
Zhi Han
SLRVGen
225
0
0
29 May 2025
Speaking images. A novel framework for the automated self-description of artworks
Speaking images. A novel framework for the automated self-description of artworks
Valentine Bernasconi
Gustavo Marfia
VGen
98
0
0
28 May 2025
FaceEditTalker: Controllable Talking Head Generation with Facial Attribute Editing
FaceEditTalker: Controllable Talking Head Generation with Facial Attribute Editing
Guanwen Feng
Zhiyuan Ma
Yunan Li
Junwei Jing
Junwei Jing
Qiguang Miao
196
0
0
28 May 2025
Exploring Timeline Control for Facial Motion Generation
Exploring Timeline Control for Facial Motion GenerationComputer Vision and Pattern Recognition (CVPR), 2025
Yifeng Ma
Jinwei Qi
Chaonan Ji
Peng Zhang
Bang Zhang
Zhidong Deng
Liefeng Bo
VGen
232
0
0
27 May 2025
HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters
HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters
Yi Chen
Sen Liang
Zixiang Zhou
Ziyao Huang
Yifeng Ma
Junshu Tang
Qin Lin
Yuan Zhou
Qinglin Lu
VGen
246
24
0
26 May 2025
Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection
Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection
Jiaxin Liu
Jia Wang
Saihui Hou
Min Ren
Huijia Wu
Long Ma
Renwang Pei
Zhaofeng He
DiffM
408
6
0
22 May 2025
MAVOS-DD: Multilingual Audio-Video Open-Set Deepfake Detection Benchmark
MAVOS-DD: Multilingual Audio-Video Open-Set Deepfake Detection Benchmark
Florinel-Alin Croitoru
Vlad Hondru
Marius Popescu
Radu Tudor Ionescu
Fahad Shahbaz Khan
Mubarak Shah
236
2
0
16 May 2025
DATA: Multi-Disentanglement based Contrastive Learning for Open-World Semi-Supervised Deepfake Attribution
DATA: Multi-Disentanglement based Contrastive Learning for Open-World Semi-Supervised Deepfake AttributionIEEE transactions on multimedia (TMM), 2025
Ming-Hui Liu
Xiao-Qian Liu
Xin Luo
Xin-Shun Xu
225
3
0
07 May 2025
KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution
KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution
Antoni Bigata
Rodrigo Mira
Stella Bounareli
Michał Stypułkowski
Konstantinos Vougioukas
Stavros Petridis
Maja Pantic
277
3
0
01 May 2025
12
Next