Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.19495
Cited By
The Role of Video Generation in Enhancing Data-Limited Action Understanding
26 May 2025
Wei Li
Dezhao Luo
Dongbao Yang
Zhenhang Li
Weiping Wang
Yu Zhou
DiffM
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Role of Video Generation in Enhancing Data-Limited Action Understanding"
50 / 69 papers shown
Title
Open-Sora: Democratizing Efficient Video Production for All
Zangwei Zheng
Xiangyu Peng
Tianji Yang
Chenhui Shen
Shenggui Li
Hongxin Liu
Yukun Zhou
Tianyi Li
Yang You
VGen
49
223
0
31 Dec 2024
Open-Sora Plan: Open-Source Large Video Generation Model
Bin Lin
Yunyang Ge
Xinhua Cheng
Zongjian Li
Bin Zhu
...
Zhang Pan
Xing Zhou
Shaoling Dong
Yonghong Tian
Li-xin Yuan
VLM
VGen
139
68
0
28 Nov 2024
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang
Jiayan Teng
Wendi Zheng
Ming Ding
Shiyu Huang
...
Weihan Wang
Yean Cheng
Xiaotao Gu
Yuxiao Dong
Jie Tang
DiffM
VGen
111
453
0
12 Aug 2024
Do Generated Data Always Help Contrastive Learning?
Yifei Wang
Jizhe Zhang
Yisen Wang
DiffM
51
23
0
19 Mar 2024
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Yixin Liu
Kai Zhang
Yuan Li
Zhiling Yan
Chujie Gao
...
Yue Huang
Hanchi Sun
Jianfeng Gao
Lifang He
Lichao Sun
VLM
VGen
EGVM
88
274
0
27 Feb 2024
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
Chengjian Feng
Yujie Zhong
Zequn Jie
Weidi Xie
Lin Ma
ObjD
54
14
0
08 Feb 2024
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition
Xiaohui Huang
Hao Zhou
Kun Yao
Kai Han
VLM
78
22
0
05 Feb 2024
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tom Tongjia Chen
Hongshan Yu
Zhengeng Yang
Zechuan Li
Wei Sun
Chen Chen
43
9
0
30 Nov 2023
Zero-Shot Video Moment Retrieval from Frozen Vision-Language Models
Dezhao Luo
Jiabo Huang
Shaogang Gong
Hailin Jin
Yang Liu
VLM
63
10
0
01 Sep 2023
Orthogonal Temporal Interpolation for Zero-Shot Video Recognition
Yan Zhu
Junbao Zhuo
B. Ma
Jiajia Geng
Xiaoming Wei
Xiaolin K. Wei
Shuhui Wang
VLM
35
6
0
14 Aug 2023
Diverse Data Augmentation with Diffusions for Effective Test-time Prompt Tuning
Chun-Mei Feng
Kai Yu
Yong Liu
Salman Khan
W. Zuo
VLM
40
80
0
11 Aug 2023
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Hang Zhang
Xin Li
Lidong Bing
MLLM
108
987
0
05 Jun 2023
StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
Yonglong Tian
Lijie Fan
Phillip Isola
Huiwen Chang
Dilip Krishnan
VLM
DiffM
64
147
0
01 Jun 2023
Training on Thin Air: Improve Image Classification with Generated Data
Yongchao Zhou
Hshmat Sahak
Jimmy Ba
DiffM
32
47
0
24 May 2023
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Hassan Akbari
Dan Kondratyuk
Huayu Chen
Rachel Hornung
Haoran Wang
Hartwig Adam
VLM
MoE
35
12
0
10 May 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
M. Shah
VLM
VPVLM
53
76
0
06 Apr 2023
Use Your Head: Improving Long-Tail Video Recognition
Toby Perrett
Saptarshi Sinha
T. Burghardt
Majid Mirmehdi
Dima Damen
69
16
0
03 Apr 2023
Unbiased Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
Hui Lv
Zhongqi Yue
Qianru Sun
Bin Luo
Zhen Cui
Hanwang Zhang
WSOD
47
68
0
22 Mar 2023
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Wei Lin
Leonid Karlinsky
Nina Shvetsova
Horst Possegger
Mateusz Koziñski
Yikang Shen
Rogerio Feris
Hilde Kuehne
Horst Bischof
VLM
114
39
0
15 Mar 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
394
13,788
0
15 Mar 2023
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training
Dezhao Luo
Jiabo Huang
S. Gong
Hailin Jin
Yang Liu
VGen
36
29
0
28 Feb 2023
Effective Data Augmentation With Diffusion Models
Brandon Trabucco
Kyle Doherty
Max Gurinas
Ruslan Salakhutdinov
VLM
DiffM
43
245
0
07 Feb 2023
Diversity is Definitely Needed: Improving Model-Agnostic Zero-shot Classification via Stable Diffusion
Jordan Shipard
Arnold Wiliem
Kien Nguyen Thanh
Wei Xiang
Clinton Fookes
DiffM
26
77
0
07 Feb 2023
Structure and Content-Guided Video Synthesis with Diffusion Models
Patrick Esser
Johnathan Chiu
Parmida Atighehchian
Jonathan Granskog
Anastasis Germanidis
DiffM
VGen
125
517
0
06 Feb 2023
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu
Xiaohan Wang
Haipeng Luo
Jingdong Wang
Yi Yang
Wanli Ouyang
125
52
0
31 Dec 2022
Fine-tuned CLIP Models are Efficient Video Learners
H. Rasheed
Muhammad Uzair Khattak
Muhammad Maaz
Salman Khan
Fahad Shahbaz Khan
CLIP
VLM
72
155
0
06 Dec 2022
Is synthetic data from generative models ready for image recognition?
Ruifei He
Shuyang Sun
Xin Yu
Chuhui Xue
Wenqing Zhang
Philip Torr
Song Bai
Xiaojuan Qi
69
294
0
14 Oct 2022
Expanding Language-Image Pretrained Models for General Video Recognition
Bolin Ni
Houwen Peng
Minghao Chen
Songyang Zhang
Gaofeng Meng
Jianlong Fu
Shiming Xiang
Haibin Ling
VLM
CLIP
ViT
67
319
0
04 Aug 2022
Video Diffusion Models
Jonathan Ho
Tim Salimans
Alexey A. Gritsenko
William Chan
Mohammad Norouzi
David J. Fleet
DiffM
VGen
117
1,563
0
07 Apr 2022
Kubric: A scalable dataset generator
Klaus Greff
Francois Belletti
Lucas Beyer
Carl Doersch
Yilun Du
...
Ziyu Wang
Tianhao Wu
K. M. Yi
Fangcheng Zhong
Andrea Tagliasacchi
62
255
0
07 Mar 2022
BigDatasetGAN: Synthesizing ImageNet with Pixel-wise Annotations
Daiqing Li
Huan Ling
Seung Wook Kim
Karsten Kreis
Adela Barriuso
Sanja Fidler
Antonio Torralba
87
106
0
12 Jan 2022
Prompting Visual-Language Models for Efficient Video Understanding
Chen Ju
Tengda Han
Kunhao Zheng
Ya Zhang
Weidi Xie
VPVLM
VLM
47
371
0
08 Dec 2021
Label-Efficient Semantic Segmentation with Diffusion Models
Dmitry Baranchuk
Ivan Rubachev
A. Voynov
Valentin Khrulkov
Artem Babenko
DiffM
VLM
205
526
0
06 Dec 2021
Improving Robustness using Generated Data
Sven Gowal
Sylvestre-Alvise Rebuffi
Olivia Wiles
Florian Stimberg
D. A. Calian
Timothy A. Mann
43
297
0
18 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
335
1,056
0
13 Oct 2021
Ensembling with Deep Generative Views
Lucy Chai
Jun-Yan Zhu
Eli Shechtman
Phillip Isola
Richard Y. Zhang
GAN
40
71
0
29 Apr 2021
Revisiting ResNets: Improved Training and Scaling Strategies
Irwan Bello
W. Fedus
Xianzhi Du
E. D. Cubuk
A. Srinivas
Nayeon Lee
Jonathon Shlens
Barret Zoph
48
299
0
13 Mar 2021
Repurposing GANs for One-shot Semantic Part Segmentation
Nontawat Tritrong
Pitchaporn Rewatbowornwong
Supasorn Suwajanakorn
64
109
0
07 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
548
28,659
0
26 Feb 2021
Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning
Yu Tian
Guansong Pang
Yuanhong Chen
Rajvinder Singh
Johan Verjans
G. Carneiro
AI4TS
29
298
0
25 Jan 2021
VideoMix: Rethinking Data Augmentation for Video Classification
Sangdoo Yun
Seong Joon Oh
Byeongho Heo
Dongyoon Han
Jinhyung Kim
386
74
0
07 Dec 2020
Localizing Anomalies from Weakly-Labeled Videos
Hui Lv
Chuanwei Zhou
Chunyan Xu
Zhen Cui
Jian Yang
30
116
0
20 Aug 2020
Learning Temporally Invariant and Localizable Features via Data Augmentation for Video Recognition
Taeoh Kim
Hyeongmin Lee
Myeongah Cho
Hankook Lee
Dong Heon Cho
Sangyoun Lee
45
25
0
13 Aug 2020
ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation
Chuang Gan
Jeremy Schwartz
S. Alter
Damian Mrowca
Martin Schrimpf
...
Antonio Torralba
J. DiCarlo
J. Tenenbaum
Josh H. McDermott
Daniel L. K. Yamins
VGen
110
308
0
09 Jul 2020
Not only Look, but also Listen: Learning Multimodal Violence Detection under Weak Supervision
Peng Wu
Jing Liu
Yujia Shi
Yujia Sun
Fang Shao
Zhaoyang Wu
Zhiwei Yang
26
308
0
09 Jul 2020
Rescaling Egocentric Vision
Dima Damen
Hazel Doughty
G. Farinella
Antonino Furnari
Evangelos Kazakos
...
Davide Moltisanti
Jonathan Munro
Toby Perrett
Will Price
Michael Wray
EgoV
33
444
0
23 Jun 2020
The AVA-Kinetics Localized Human Actions Video Dataset
Ang Li
Meghana Thotakuri
David A. Ross
João Carreira
Alexander Vostrikov
Andrew Zisserman
VGen
26
135
0
01 May 2020
Virtual KITTI 2
Yohann Cabon
Naila Murray
Martin Humenberger
3DPC
22
280
0
29 Jan 2020
GODS: Generalized One-class Discriminative Subspaces for Anomaly Detection
Jue Wang
A. Cherian
CML
29
114
0
16 Aug 2019
Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling
Jia Zheng
Junfei Zhang
Jing Li
Rui Tang
Shenghua Gao
Zihan Zhou
3DV
46
268
0
01 Aug 2019
1
2
Next