Generative Video Transformer: Can Objects be the Words?

20 July 2021

Papers citing "Generative Video Transformer: Can Objects be the Words?"

24 / 24 papers shown

Title
RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation Zhiqiang Yuan Ting Zhang Ying Deng Jiapei Zhang Yeshuang Zhu Zexi Jia Jie Zhou Jinchao Zhang VGen 39 0 0 22 Mar 2025
Object-Centric Image to Video Generation with Language Guidance Angel Villar-Corrales Gjergj Plepi Sven Behnke DiffM VGen OCL 71 0 0 17 Feb 2025
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models Amir Mohammad Karimi Mamaghan Samuele Papa Karl Henrik Johansson Stefan Bauer Andrea Dittadi OCL 37 5 0 22 Jul 2024
Graph Transformers: A Survey Ahsan Shehzad Feng Xia Shagufta Abid Ciyuan Peng Shuo Yu Dongyu Zhang Karin Verspoor AI4CE 29 9 0 13 Jul 2024
Learning Disentangled Representation in Object-Centric Models for Visual Dynamics Prediction via Transformers Sanket Gandhi Atul Samanyu Mahajan Vishal Sharma Rushil Gupta Arnab Kumar Mondal Parag Singla ViT OCL 35 0 0 03 Jul 2024
Slot State Space Models Jindong Jiang Fei Deng Gautam Singh Minseung Lee Sungjin Ahn 39 4 0 18 Jun 2024
Neural Language of Thought Models Yi-Fu Wu Minseung Lee Sungjin Ahn MLLM VLM 48 6 0 02 Feb 2024
DDLP: Unsupervised Object-Centric Video Prediction with Deep Dynamic Latent Particles Tal Daniel Aviv Tamar DiffM 14 7 0 09 Jun 2023
Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object Video Generation A. Davtyan Paolo Favaro VGen 11 4 0 06 Jun 2023
Object-Centric Video Prediction via Decoupling of Object Dynamics and Interactions Angel Villar-Corrales Ismail Wahdan Sven Behnke OCL 11 7 0 23 Feb 2023
An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning Jaesik Yoon Yi-Fu Wu Heechul Bae Sungjin Ahn OCL 17 39 0 09 Feb 2023
OAMixer: Object-aware Mixing Layer for Vision Transformers H. Kang Sangwoo Mo Jinwoo Shin VLM 29 4 0 13 Dec 2022
Neural Systematic Binder Gautam Singh Yeongbin Kim Sungjin Ahn OCL 19 36 0 02 Nov 2022
SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models Ziyi Wu Nikita Dvornik Klaus Greff Thomas Kipf Animesh Garg OCL BDL 59 89 0 12 Oct 2022
Seeing the forest and the tree: Building representations of both individual and collective dynamics with transformers Ran Liu Mehdi Azabou M. Dabagia Jingyun Xiao Eva L. Dyer AI4CE 27 19 0 10 Jun 2022
Patch-based Object-centric Transformers for Efficient Video Generation Wilson Yan Ryogo Okumura Stephen James Pieter Abbeel DiffM ViT 23 6 0 08 Jun 2022
Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos Gautam Singh Yi-Fu Wu Sungjin Ahn OCL 28 113 0 27 May 2022
Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning Ligong Han Jian Ren Hsin-Ying Lee Francesco Barbieri Kyle Olszewski Shervin Minaee Dimitris N. Metaxas Sergey Tulyakov DiffM VGen 19 41 0 04 Mar 2022
Recent Advances in Vision Transformer: A Survey and Outlook of Recent Work Khawar Islam ViT 24 44 0 03 Mar 2022
TransDreamer: Reinforcement Learning with Transformer World Models Changgu Chen Yi-Fu Wu Jaesik Yoon Sungjin Ahn OffRL 16 90 0 19 Feb 2022
Illiterate DALL-E Learns to Compose Gautam Singh Fei Deng Sungjin Ahn CoGe OCL 22 131 0 17 Oct 2021
TransTrack: Multiple Object Tracking with Transformer Pei Sun Jinkun Cao Yi-Xin Jiang Rufeng Zhang Enze Xie Zehuan Yuan Changhu Wang Ping Luo ViT VOT 241 564 0 31 Dec 2020
Learning Object Permanence from Video Aviv Shamsian Ofri Kleinfeld Amir Globerson Gal Chechik SSL 29 31 0 23 Mar 2020
Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network Wenzhe Shi Jose Caballero Ferenc Huszár J. Totz Andrew P. Aitken Rob Bishop Daniel Rueckert Zehan Wang SupR 190 5,163 0 16 Sep 2016