Papers citing 'Learning to Generate Long-term Future via Hierarchical Prediction'

Title
Bridging Text and Video Generation: A Survey Nilay Kumar Priyansh Bhandari G. Maragatham VGen 204 0 0 06 Oct 2025
MoReFlow: Motion Retargeting Learning through Unsupervised Flow Matching Wontaek Kim Tianyu Li Sehoon Ha 129 0 0 29 Sep 2025
Integrating Reinforcement Learning with Visual Generative Models: Foundations and Advances Yuanzhi Liang Yijie Fang Rui Li Ziqi Ni Ruijie Su Chi Zhang EGVM 166 2 0 14 Aug 2025
FG-DFPN: Flow Guided Deformable Frame Prediction Network M. Akın Yılmaz Ahmet Bilican A. Murat Tekalp 225 0 0 14 Mar 2025
Continuous Video Process: Modeling Videos as Continuous Multi-Dimensional Processes for Video Prediction Gaurav Shrivastava Abhinav Shrivastava VGen DiffM 228 0 0 06 Dec 2024
COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion EstimationEuropean Conference on Computer Vision (ECCV), 2024 Jiefeng Li Ye Yuan Davis Rempe Haotian Zhang Pavlo Molchanov Cewu Lu Jan Kautz Umar Iqbal DiffM VGen 227 4 0 29 Aug 2024
Enhancing Bandwidth Efficiency for Video Motion Transfer Applications using Deep Learning Based Keypoint Prediction Xue Bai Tasmiah Haque S. Mohan Yuliang Cai Byungheon Jeong Adam Halasz Srinjoy Das 154 1 0 17 Mar 2024
Predictive Temporal Attention on Event-based Video Stream for Energy-efficient Situation Awareness Yiming Bu Jiayang Liu Qinru Qiu 131 2 0 14 Feb 2024
Modeling Spatio-temporal Dynamical Systems with Neural Discrete Learning and Levels-of-ExpertsIEEE Transactions on Knowledge and Data Engineering (TKDE), 2024 Kun Wang Hao Wu Guibin Zhang Cunchun Li Yuxuan Liang Yuankai Wu Roger Zimmermann Yang Wang 141 18 0 06 Feb 2024
SFGANS Self-supervised Future Generator for human ActioN Segmentation Or Berman Adam Goldbraikh S. Laufer 203 0 0 31 Dec 2023
HMP: Hand Motion Priors for Pose and Shape Estimation from Video Enes Duran Muhammed Kocabas Vasileios Choutas Zicong Fan Michael J. Black 3DH 159 14 0 27 Dec 2023
Earthfarseer: Versatile Spatio-Temporal Dynamical Systems Modeling in One Model Hao Wu Yuxuan Liang Wei Xiong Zhengyang Zhou Wei-Ming Huang Shilong Wang Kun Wang AI4TS 317 18 0 13 Dec 2023
PACE: Human and Camera Motion Estimation from in-the-wild Videos Muhammed Kocabas Ye Yuan Pavlo Molchanov Yunrong Guo Michael J. Black Otmar Hilliges Jan Kautz Umar Iqbal 3DH 173 29 0 20 Oct 2023
Predicting Future Spatiotemporal Occupancy Grids with Semantics for Autonomous Driving Maneekwan Toyungyernsub Esen Yel Jiachen Li Mykel J. Kochenderfer 175 4 0 03 Oct 2023
Dancing Avatar: Pose and Text-Guided Human Motion Videos Synthesis with Image Diffusion Model Bosheng Qin Wentao Ye Qifan Yu Siliang Tang Yueting Zhuang DiffM VGen 94 18 0 15 Aug 2023
Does Unpredictability Influence Driving Behavior?IEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023 Sepehr Samavi Florian Shkurti Angela P. Schoellig 111 1 0 28 Jul 2023
DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles Tal Daniel Aviv Tamar DiffM 210 12 0 09 Jun 2023
Putting People in Their Place: Affordance-Aware Human Insertion into ScenesComputer Vision and Pattern Recognition (CVPR), 2023 Sumith Kulal Tim Brooks A. Aiken Jiajun Wu Jimei Yang Jingwan Lu Alexei A. Efros Krishna Kumar Singh DiffM 156 56 0 27 Apr 2023
Combining Vision and Tactile Sensation for Video Prediction Willow Mandil Amir M. Ghalamzan-E. 80 4 0 21 Apr 2023
Prior based Sampling for Adaptive LiDAR Amit Shomer S. Avidan 3DV 3DPC MDE 220 1 0 14 Apr 2023
Model-Based Reinforcement Learning with Isolated ImaginationsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023 Minting Pan Geng Chen Yitao Zheng Yunbo Wang Xiaokang Yang 285 2 0 27 Mar 2023
Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional TransformersComputer Vision and Pattern Recognition (CVPR), 2023 Jaehoon Yoo Semin Kim Doyup Lee Chiheon Kim Seunghoon Hong 193 6 0 20 Mar 2023
Implicit Stacked Autoregressive Model for Video Prediction Min-seok Seo Hakjin Lee Do-Yeon Kim Junghoon Seo VGen 107 20 0 14 Mar 2023
Locomotion-Action-Manipulation: Synthesizing Human-Scene Interactions in Complex 3D EnvironmentsIEEE International Conference on Computer Vision (ICCV), 2023 Jiye Lee Hanbyul Joo 272 49 0 09 Jan 2023
Motion and Context-Aware Audio-Visual Conditioned Video PredictionBritish Machine Vision Conference (BMVC), 2022 Yating Xu Conghui Hu G. Lee VGen 325 1 0 09 Dec 2022
MIMO Is All You Need : A Strong Multi-In-Multi-Out Baseline for Video Prediction Shuliang Ning Mengcheng Lan Yanran Li Chaofeng Chen Qian Chen Xunlai Chen Xiaoguang Han Shuguang Cui 179 27 0 09 Dec 2022
SimVP: Towards Simple yet Powerful Spatiotemporal Predictive LearningIEEE transactions on multimedia (IEEE TMM), 2022 Cheng Tan Zhangyang Gao Siyuan Li Stan Z. Li VLM AI4TS 197 22 0 22 Nov 2022
Autoregressive GAN for Semantic Unconditional Head Motion Generation Louis Airale Xavier Alameda-Pineda Stéphane Lathuilière Dominique Vaufreydaz 181 4 0 02 Nov 2022
SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric ModelsInternational Conference on Learning Representations (ICLR), 2022 Ziyi Wu Nikita Dvornik Klaus Greff Thomas Kipf Animesh Garg OCL BDL 286 112 0 12 Oct 2022
Hierarchical Capsule Prediction Network for Marketing Campaigns EffectInternational Conference on Information and Knowledge Management (CIKM), 2022 Zhixuan Chu Hui Ding Guang Zeng Yuchen Huang T. Yan Yulin Kang Sheng Li 148 9 0 22 Aug 2022
A new way of video compression via forward-referencing using deep learning S. Rajin M. Murshed M. Paul S. Teng J. Ma 74 0 0 13 Aug 2022
Large-scale Knowledge Distillation with Elastic Heterogeneous Computing ResourcesConcurrency and Computation (CCPE), 2022 Ji Liu Daxiang Dong Xi Wang An Qin Xingjian Li P. Valduriez Dejing Dou Dianhai Yu 142 8 0 14 Jul 2022
Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet Shihao Zou Yuanlu Xu Chao Li Lingni Ma Li Cheng Minh Vo 243 17 0 09 Jul 2022
SimVP: Simpler yet Better Video PredictionComputer Vision and Pattern Recognition (CVPR), 2022 Zhangyang Gao Cheng Tan Lirong Wu Stan Z. Li 261 308 0 09 Jun 2022
Patch-based Object-centric Transformers for Efficient Video Generation Wilson Yan Ryogo Okumura Stephen James Pieter Abbeel DiffM ViT 207 6 0 08 Jun 2022
FlexLip: A Controllable Text-to-Lip SystemItalian National Conference on Sensors (INS), 2022 Dan Oneaţă Beáta Lőrincz Adriana Stan H. Cucu 131 5 0 07 Jun 2022
Cascaded Video Generation for Videos In-the-WildInternational Conference on Pattern Recognition (ICPR), 2022 Lluis Castrejon Nicolas Ballas Aaron Courville VGen 158 0 0 01 Jun 2022
Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World ModelsNeural Information Processing Systems (NeurIPS), 2022 Minting Pan Geng Chen Yunbo Wang Xiaokang Yang 257 53 0 27 May 2022
Future Object Detection with Spatiotemporal Transformers Adam Tonderski Joakim Johnander Christoffer Petersson Kalle AAstrom ViT 135 1 0 21 Apr 2022
When Physics Meets Machine Learning: A Survey of Physics-Informed Machine Learning Chuizheng Meng Sungyong Seo Defu Cao Sam Griesemer Yan Liu PINN AI4CE 250 106 0 31 Mar 2022
Stochastic Video Prediction with Structure and Motion Adil Kaan Akan Sadra Safadoust Fatma Guney VGen 156 10 0 20 Mar 2022
Transframer: Arbitrary Frame Prediction with Generative Models C. Nash João Carreira Jacob Walker Iain Barr Andrew Jaegle Mateusz Malinowski Peter W. Battaglia ViT 245 44 0 17 Mar 2022
MSPred: Video Prediction at Multiple Spatio-Temporal Scales with Hierarchical Recurrent NetworksBritish Machine Vision Conference (BMVC), 2022 Angel Villar-Corrales Ani J. Karapetyan Andreas Boltres Sven Behnke 291 12 0 17 Mar 2022
Show Me What and Tell Me How: Video Synthesis via Multimodal ConditioningComputer Vision and Pattern Recognition (CVPR), 2022 Ligong Han Jian Ren Hsin-Ying Lee Francesco Barbieri Kyle Olszewski Shervin Minaee Dimitris N. Metaxas Sergey Tulyakov DiffM VGen 201 45 0 04 Mar 2022
Filtered-CoPhy: Unsupervised Learning of Counterfactual Physics in Pixel SpaceInternational Conference on Learning Representations (ICLR), 2022 Steeven Janny Fabien Baradel Natalia Neverova M. Nadri Greg Mori Christian Wolf CML 184 17 0 01 Feb 2022
Autoencoding Video Latents for Adversarial Video Generation Sai Hemanth Kasaraneni VGen 90 3 0 18 Jan 2022
Image Animation with Keypoint Mask Or Toledano Yanir Marmor Dov Gertz VGen 112 2 0 20 Dec 2021
A Hierarchical Spatio-Temporal Graph Convolutional Neural Network for Anomaly Detection in Videos Xianling Zeng Yalong Jiang Wenrui Ding Hongguang Li Yafeng Hao Zifeng Qiu 174 75 0 08 Dec 2021
GLAMR: Global Occlusion-Aware Human Mesh Recovery with Dynamic Cameras Ye Yuan Umar Iqbal Pavlo Molchanov Kris Kitani Jan Kautz 3DH 278 147 0 02 Dec 2021
Layered Controllable Video Generation Jiahui Huang Yuhe Jin K. M. Yi Leonid Sigal VGen 294 11 0 24 Nov 2021