See the Speaker: Crafting High-Resolution Talking Faces from Speech with Prior Guidance and Region RefinementIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025 |
MOS: Modeling Object-Scene Associations in Generalized Category DiscoveryComputer Vision and Pattern Recognition (CVPR), 2025 |
Mimir: Improving Video Diffusion Models for Precise Text UnderstandingComputer Vision and Pattern Recognition (CVPR), 2024 |
EmoGene: Audio-Driven Emotional 3D Talking-Head GenerationIEEE International Conference on Automatic Face & Gesture Recognition (FG), 2024 |
Diverse Code Query Learning for Speech-Driven Facial AnimationIEEE Transactions on Visualization and Computer Graphics (TVCG), 2024 |
EDTalk: Efficient Disentanglement for Emotional Talking Head SynthesisEuropean Conference on Computer Vision (ECCV), 2024 |
TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking
StylesIEEE transactions on multimedia (IEEE TMM), 2023 |