NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

Annual Meeting of the Association for Computational Linguistics (ACL), 2023

22 March 2023

Fan Yang

Zicheng Liu

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation"

50 / 77 papers shown

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

204

02 Dec 2025

TalkingPose: Efficient Face and Gesture Animation with Feedback-guided Diffusion Model

134

30 Nov 2025

Flow and Depth Assisted Video Prediction with Latent Transformer

149

20 Nov 2025

TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction

114

16 Nov 2025

A Best-of-Both-Worlds Proof for Tsallis-INF without Fenchel Conjugates

Wei-Cheng Lee

Francesco Orabona

123

14 Nov 2025

Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation

152

27 Oct 2025

MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation

116

21 Oct 2025

Terra: Explorable Native 3D World Model with Point Latents

126

16 Oct 2025

Arbitrary Generative Video Interpolation

148

01 Oct 2025

LongLive: Real-time Interactive Long Video Generation

...

241

26 Sep 2025

How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective

...

319

23 Sep 2025

WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception

168

21 Aug 2025

PersonaVlog: Personalized Multimodal Vlog Generation with Multi-Agent Collaboration and Iterative Self-Correction

129

19 Aug 2025

Matrix-game 2.0: An open-source real-time and streaming interactive world model

...

310

18 Aug 2025

MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling

307

11 Aug 2025

StableAvatar: Infinite-Length Audio-Driven Avatar Video Generation

164

11 Aug 2025

Enhancing Scene Transition Awareness in Video Generation via Post-Training

153

24 Jul 2025

NarrLV: Towards a Comprehensive Narrative-Centric Evaluation for Long Video Generation

549

15 Jul 2025

SimpleGVR: A Simple Baseline for Latent-Cascaded Video Super-Resolution

405

24 Jun 2025

STAGE: A Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation

317

16 Jun 2025

Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation

...

326

04 Jun 2025

Physics-Guided Motion Loss for Video Generation Model

167

02 Jun 2025

A Survey of Generative Categories and Techniques in Multimodal Generative Models

399

29 May 2025

ProphetDWM: A Driving World Model for Rolling Out Future Actions and Videos

Xiaodong Wang

Peixi Peng

VGen

1.3K

24 May 2025

Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts

Taewon Kang

Ming C. Lin

DiffM VGen

389

22 May 2025

EventDiff: A Unified and Efficient Diffusion Model Framework for Event-based Video Frame Interpolation

260

13 May 2025

FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component AnalysisComputer Vision and Pattern Recognition (CVPR), 2025

329

02 May 2025

Frame Context Packing and Drift Prevention in Next-Frame-Prediction Video Diffusion Models

534

17 Apr 2025

OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding

601

15 Apr 2025

KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation

281

13 Apr 2025

ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos

...

827

20 Mar 2025

EQ-TAA: Equivariant Traffic Accident Anticipation via Diffusion-Based Accident Video Synthesis

241

16 Mar 2025

Text2Story: Advancing Video Storytelling with Text Guidance

416

08 Mar 2025

KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame InterpolationComputer Vision and Pattern Recognition (CVPR), 2025

Konstantinos Vougioukas

357

03 Mar 2025

ASurvey: Spatiotemporal Consistency in Video Generation

274

25 Feb 2025

MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation

693

18 Feb 2025

Towards Precise Scaling Laws for Video Diffusion TransformersComputer Vision and Pattern Recognition (CVPR), 2024

...

437

03 Jan 2025

AdaDiff: Adaptive Step Selection for Fast Diffusion Models

332

31 Dec 2024

Grid Diffusion Models for Text-to-Video GenerationComputer Vision and Pattern Recognition (CVPR), 2024

Taegyeong Lee

Soyeong Kwon

Taehwan Kim

313

31 Dec 2024

Enhancing Long Video Generation Consistency without Tuning

325

23 Dec 2024

Video Diffusion Transformers are In-Context Learners

882

14 Dec 2024

Sonic: Shifting Focus to Global Audio Perception in Portrait AnimationComputer Vision and Pattern Recognition (CVPR), 2024

...

411

25 Nov 2024

Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric

...

533

25 Nov 2024

MovieBench: A Hierarchical Movie Level Dataset for Long Video GenerationComputer Vision and Pattern Recognition (CVPR), 2024

445

22 Nov 2024

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video GenerationInternational Conference on Learning Representations (ICLR), 2024

Zongyi Li

Furu Wei

426

27 Oct 2024

EVA: An Embodied World Model for Future Video Anticipation

...

235

20 Oct 2024

Progressive Autoregressive Video Diffusion Models

314

10 Oct 2024

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

Yuqing Wang

Yang Zhao

375

03 Oct 2024

LVCD: Reference-based Lineart Video Colorization with Diffusion ModelsACM Transactions on Graphics (TOG), 2024

Zhitong Huang

Mohan Zhang

Jing Liao

DiffM VGen

305

19 Sep 2024

DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving

Xuan Di

218

29 Aug 2024