ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.12489
  4. Cited By
Video-GPT via Next Clip Diffusion

Video-GPT via Next Clip Diffusion

18 May 2025
Shaobin Zhuang
Zhipeng Huang
Ying Zhang
Fangyikang Wang
Canmiao Fu
Binxin Yang
Chong Sun
Chen Li
Yali Wang
    DiffM
    VGen
ArXivPDFHTML

Papers citing "Video-GPT via Next Clip Diffusion"

21 / 71 papers shown
Title
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
256
15,081
0
20 Dec 2021
L-Verse: Bidirectional Generation Between Image and Text
L-Verse: Bidirectional Generation Between Image and Text
Taehoon Kim
Gwangmo Song
Sihaeng Lee
Sangyun Kim
Yewon Seo
Soonyoung Lee
S. Kim
Honglak Lee
Kyunghoon Bae
40
25
0
22 Nov 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text
  Understanding
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
291
567
0
28 Sep 2021
Diffusion Models Beat GANs on Image Synthesis
Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal
Alex Nichol
133
7,639
0
11 May 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
95
1,154
0
01 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
671
28,659
0
26 Feb 2021
Score-Based Generative Modeling through Stochastic Differential
  Equations
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song
Jascha Narain Sohl-Dickstein
Diederik P. Kingma
Abhishek Kumar
Stefano Ermon
Ben Poole
DiffM
SyDa
262
6,293
0
26 Nov 2020
mT5: A massively multilingual pre-trained text-to-text transformer
mT5: A massively multilingual pre-trained text-to-text transformer
Linting Xue
Noah Constant
Adam Roberts
Mihir Kale
Rami Al-Rfou
Aditya Siddhant
Aditya Barua
Colin Raffel
91
2,489
0
22 Oct 2020
Support-set bottlenecks for video-text representation learning
Support-set bottlenecks for video-text representation learning
Mandela Patrick
Po-Yao (Bernie) Huang
Yuki M. Asano
Florian Metze
Alexander G. Hauptmann
João Henriques
Andrea Vedaldi
52
246
0
06 Oct 2020
Memory-augmented Dense Predictive Coding for Video Representation
  Learning
Memory-augmented Dense Predictive Coding for Video Representation Learning
Tengda Han
Weidi Xie
Andrew Zisserman
SSL
56
241
0
03 Aug 2020
Denoising Diffusion Probabilistic Models
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
284
17,550
0
19 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
469
41,106
0
28 May 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
447
4,662
0
23 Jan 2020
Lower Dimensional Kernels for Video Discriminators
Lower Dimensional Kernels for Video Discriminators
Emmanuel Kahembwe
S. Ramamoorthy
44
50
0
18 Dec 2019
Towards Accurate Generative Models of Video: A New Metric & Challenges
Towards Accurate Generative Models of Video: A New Metric & Challenges
Thomas Unterthiner
Sjoerd van Steenkiste
Karol Kurach
Raphaël Marinier
Marcin Michalski
Sylvain Gelly
EGVM
VGen
55
711
0
03 Dec 2018
Train Sparsely, Generate Densely: Memory-efficient Unsupervised Training
  of High-resolution Temporal GAN
Train Sparsely, Generate Densely: Memory-efficient Unsupervised Training of High-resolution Temporal GAN
Masaki Saito
Shunta Saito
Masanori Koyama
Sosuke Kobayashi
44
146
0
22 Nov 2018
A Short Note about Kinetics-600
A Short Note about Kinetics-600
João Carreira
Eric Noland
Andras Banki-Horvath
Chloe Hillier
Andrew Zisserman
69
520
0
03 Aug 2018
Image Transformer
Image Transformer
Niki Parmar
Ashish Vaswani
Jakob Uszkoreit
Lukasz Kaiser
Noam M. Shazeer
Alexander Ku
Dustin Tran
ViT
85
1,673
0
15 Feb 2018
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
89
3,666
0
08 Jun 2017
U-Net: Convolutional Networks for Biomedical Image Segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
1.2K
76,547
0
18 May 2015
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
K. Soomro
Amir Zamir
M. Shah
CLIP
VGen
92
6,100
0
03 Dec 2012
Previous
12