ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.07322
  4. Cited By
GenHowTo: Learning to Generate Actions and State Transformations from
  Instructional Videos
v1v2 (latest)

GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos

Computer Vision and Pattern Recognition (CVPR), 2023
12 December 2023
Tomávs Souvcek
Dima Damen
Michael Wray
Ivan Laptev
Josef Sivic
    VGen
ArXiv (abs)PDFHTML

Papers citing "GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos"

26 / 26 papers shown
Title
A Step Toward World Models: A Survey on Robotic Manipulation
A Step Toward World Models: A Survey on Robotic Manipulation
Peng-Fei Zhang
Ying Cheng
Xiaofan Sun
S. Wang
Lei Zhu
Lei Zhu
Heng Tao Shen
LM&Ro
542
2
0
31 Oct 2025
Mask2IV: Interaction-Centric Video Generation via Mask Trajectories
Mask2IV: Interaction-Centric Video Generation via Mask Trajectories
Gen Li
Bo Zhao
Jianfei Yang
Laura Sevilla-Lara
VGen
115
2
0
03 Oct 2025
EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory
EvoWorld: Evolving Panoramic World Generation with Explicit 3D Memory
Jiahao Wang
Luoxin Ye
Taiming Lu
Junfei Xiao
Jiahan Zhang
...
Xijun Liu
Rama Chellappa
Cheng-Fang Peng
Alan Yuille
Jieneng Chen
VGen
93
1
0
01 Oct 2025
Mash, Spread, Slice! Learning to Manipulate Object States via Visual Spatial Progress
Mash, Spread, Slice! Learning to Manipulate Object States via Visual Spatial Progress
Priyanka Mandikal
Jiaheng Hu
Shivin Dass
Sagnik Majumder
Roberto Martín-Martín
Kristen Grauman
92
1
0
28 Sep 2025
Ego-centric Predictive Model Conditioned on Hand Trajectories
Ego-centric Predictive Model Conditioned on Hand Trajectories
Binjie Zhang
Mike Zheng Shou
EgoV
217
0
0
27 Aug 2025
Precise Action-to-Video Generation Through Visual Action Prompts
Precise Action-to-Video Generation Through Visual Action Prompts
Yuang Wang
Chao Wen
Haoyu Guo
Sida Peng
Minghan Qin
Hujun Bao
Xiaowei Zhou
Ruizhen Hu
VGen
80
2
0
18 Aug 2025
The Promise of RL for Autoregressive Image Editing
The Promise of RL for Autoregressive Image Editing
Saba Ahmadi
Rabiul Awal
Ankur Sikarwar
Amirhossein Kazemnejad
Ge Ya Luo
...
Sai Rajeswar
Siva Reddy
C. Pal
Benno Krojer
Aishwarya Agrawal
OffRLKELM
200
2
0
01 Aug 2025
Towards Consistent Long-Term Pose Generation
Towards Consistent Long-Term Pose Generation
Yayuan Li
Filippos Bellos
Jason J. Corso
127
0
0
24 Jul 2025
VisualChef: Generating Visual Aids in Cooking via Mask Inpainting
VisualChef: Generating Visual Aids in Cooking via Mask Inpainting
Oleh Kuzyk
Zuoyue Li
Marc Pollefeys
Xi Wang
111
0
0
23 Jun 2025
Enhance Multimodal Consistency and Coherence for Text-Image Plan Generation
Enhance Multimodal Consistency and Coherence for Text-Image Plan GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Xiaoxin Lu
Ranran Haoran Zhang
Yusen Zhang
Rui Zhang
DiffM
226
0
0
13 Jun 2025
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
Chi-Hsi Kung
Frangil Ramirez
Juhyung Ha
Yi-Ting Chen
David J. Crandall
Yi-Hsuan Tsai
610
2
0
27 Mar 2025
Latent Beam Diffusion Models for Generating Visual Sequences
Latent Beam Diffusion Models for Generating Visual Sequences
Guilherme Fernandes
Vasco Ramos
Regev Cohen
Idan Szpektor
João Magalhães
312
1
0
26 Mar 2025
Stitch-a-Demo: Video Demonstrations from Multistep Descriptions
Stitch-a-Demo: Video Demonstrations from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
DiffM
198
1
0
18 Mar 2025
SPOC: Spatially-Progressing Object State Change Segmentation in Video
SPOC: Spatially-Progressing Object State Change Segmentation in Video
Priyanka Mandikal
Tushar Nagarajan
Alex Stoken
Zihui Xue
Kristen Grauman
178
1
0
15 Mar 2025
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflectionInternational Conference on Learning Representations (ICLR), 2025
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
320
3
0
12 Mar 2025
Learning Human Skill Generators at Key-Step Levels
Learning Human Skill Generators at Key-Step Levels
Yilu Wu
Chenhui Zhu
Shuai Wang
Hanlin Wang
Jing Wang
Zhaoxiang Zhang
Limin Wang
VGen
334
1
0
12 Feb 2025
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Weijia Mao
Zhiyong Yang
Mike Zheng Shou
MoE
607
2
0
10 Feb 2025
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video
  Prompting
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video PromptingAAAI Conference on Artificial Intelligence (AAAI), 2024
Muhammet Furkan Ilaslan
Ali Koksal
Kevin Qinghong Lin
Burak Satar
Mike Zheng Shou
Qianli Xu
LM&Ro
208
2
0
16 Dec 2024
ACDC: Autoregressive Coherent Multimodal Generation using Diffusion
  Correction
ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction
Hyungjin Chung
Dohun Lee
Jong Chul Ye
VGenDiffM
123
2
0
07 Oct 2024
VEDIT: Latent Prediction Architecture For Procedural Video
  Representation Learning
VEDIT: Latent Prediction Architecture For Procedural Video Representation LearningInternational Conference on Learning Representations (ICLR), 2024
Han Lin
Tushar Nagarajan
Nicolas Ballas
Mido Assran
Mojtaba Komeili
Joey Tianyi Zhou
Koustuv Sinha
AI4TS
217
5
0
04 Oct 2024
Show-o: One Single Transformer to Unify Multimodal Understanding and
  Generation
Show-o: One Single Transformer to Unify Multimodal Understanding and GenerationInternational Conference on Learning Representations (ICLR), 2024
Jinheng Xie
Weijia Mao
Zechen Bai
David Junhao Zhang
Weihao Wang
Kevin Qinghong Lin
Yuchao Gu
Zhijie Chen
Zhenheng Yang
Mike Zheng Shou
288
412
0
22 Aug 2024
Learning Action and Reasoning-Centric Image Editing from Videos and
  Simulations
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations
Benno Krojer
Dheeraj Vattikonda
Luis Lara
Varun Jampani
Eva Portelance
Christopher Pal
Siva Reddy
EGVMVGen
250
14
0
03 Jul 2024
Coherent Zero-Shot Visual Instruction Generation
Coherent Zero-Shot Visual Instruction Generation
Quynh Phung
Songwei Ge
Jia-Bin Huang
277
2
0
06 Jun 2024
SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional
  Videos
SCHEMA: State CHangEs MAtter for Procedure Planning in Instructional Videos
Yulei Niu
Wenliang Guo
Long Chen
Xudong Lin
Shih-Fu Chang
206
21
0
03 Mar 2024
Video as the New Language for Real-World Decision Making
Video as the New Language for Real-World Decision Making
Sherry Yang
Jacob Walker
Jack Parker-Holder
Yilun Du
Jake Bruce
Andre Barreto
Pieter Abbeel
Dale Schuurmans
VGen
223
79
0
27 Feb 2024
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction
  Tuning
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
Bolin Lai
Xiaoliang Dai
Lawrence Chen
Guan Pang
James M. Rehg
Miao Liu
257
21
0
06 Dec 2023
1