Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1212.0402
Cited By
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
3 December 2012
K. Soomro
Amir Zamir
M. Shah
CLIP
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild"
50 / 92 papers shown
Title
The Role of Video Generation in Enhancing Data-Limited Action Understanding
Wei Li
Dezhao Luo
Dongbao Yang
Zhenhang Li
Weiping Wang
Yu Zhou
DiffM
VGen
93
0
0
26 May 2025
Temporal Consistency Constrained Transferable Adversarial Attacks with Background Mixup for Action Recognition
Ping Li
Jianan Ni
Bo Pang
AAML
88
0
0
23 May 2025
Enhancing Adversarial Robustness of Vision Language Models via Adversarial Mixture Prompt Tuning
Shiji Zhao
Qihui Zhu
Shukun Xiong
Shouwei Ruan
Yize Fan
Ranjie Duan
Qing Guo
Xingxing Wei
AAML
VLM
34
0
0
23 May 2025
FDBPL: Faster Distillation-Based Prompt Learning for Region-Aware Vision-Language Models Adaptation
Zherui Zhang
Jiaxin Wu
Changwei Wang
Rongtao Xu
Longzhao Huang
Wenhao Xu
Wenbo Xu
Li Guo
Shibiao Xu
VLM
VPVLM
57
0
0
23 May 2025
Video-GPT via Next Clip Diffusion
Shaobin Zhuang
Zhipeng Huang
Ying Zhang
Fangyikang Wang
Canmiao Fu
Binxin Yang
Chong Sun
Chen Li
Yali Wang
DiffM
VGen
93
0
0
18 May 2025
Generative Pre-trained Autoregressive Diffusion Transformer
Yuan Zhang
Jiacheng Jiang
Guoqing Ma
Zhiying Lu
Haoyang Huang
Jianlong Yuan
Nan Duan
VGen
68
1
0
12 May 2025
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via
α
α
α
-
β
β
β
-Divergence
Guanghui Wang
Zhiyong Yang
Ziyi Wang
Shi Wang
Qianqian Xu
Qingming Huang
116
0
0
07 May 2025
Enhancing Target-unspecific Tasks through a Features Matrix
Fangming Cui
Yonggang Zhang
Xuan Wang
Xinmei Tian
Jun Yu
AAML
64
1
0
06 May 2025
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
Weicai Yan
Wang Lin
Zirun Guo
Ye Wang
Fangming Feng
Xiaoda Yang
Ziyi Wang
Tao Jin
DiffM
210
2
0
30 Apr 2025
Latent Video Dataset Distillation
Ning Li
Antai Andy Liu
Jingran Zhang
Justin Cui
DD
VGen
89
0
0
23 Apr 2025
Fast Adversarial Training with Weak-to-Strong Spatial-Temporal Consistency in the Frequency Domain on Videos
Songping Wang
Hanqing Liu
Yueming Lyu
Xiantao Hu
Ziwen He
Wenjie Wang
Caifeng Shan
Lei Wang
AAML
284
0
0
21 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
150
5
0
17 Apr 2025
Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation
Chuanqi Cheng
Jian Guan
Wei Wu
Rui Yan
VLM
102
0
0
03 Apr 2025
UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao
Yiyang Gan
Bairui Wang
Jie Qin
Shuang Xu
Siqi Yang
Lin Ma
80
0
0
02 Apr 2025
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Yuchao Gu
Weijia Mao
Mike Zheng Shou
VGen
104
3
0
25 Mar 2025
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
Xiangrui Liu
Yan Shu
Zhengyang Liang
Ao Li
Yang Tian
Bo Zhao
VGen
VLM
141
3
0
24 Mar 2025
Mitigating Cache Noise in Test-Time Adaptation for Large Vision-Language Models
Haotian Zhai
Xinyu Chen
Can Zhang
Tianming Sha
Ruirui Li
BDL
VLM
112
0
0
24 Mar 2025
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
Zichen Liu
Kunlun Xu
Fuchun Sun
Xu Zou
Yuxin Peng
Jiahuan Zhou
VLM
AI4TS
112
2
0
20 Mar 2025
ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos
Haolin Yang
Feilong Tang
Ming Hu
Yulong Li
Junjie Guo
...
Zelin Peng
Junjun He
Junjun He
Zongyuan Ge
Imran Razzak
DiffM
VGen
167
2
0
20 Mar 2025
Continual Multimodal Contrastive Learning
Xiaohao Liu
Xiaobo Xia
See-Kiong Ng
Tat-Seng Chua
CLL
132
1
0
19 Mar 2025
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis
Imanol G. Estepa
Jesús M. Rodríguez-de-Vera
Ignacio Sarasúa
Bhalaji Nagarajan
Petia Radeva
91
0
0
19 Mar 2025
TLAC: Two-stage LMM Augmented CLIP for Zero-Shot Classification
Ans Munir
Faisal Z. Qureshi
M. H. Khan
Mohsen Ali
VLM
87
0
0
15 Mar 2025
A Large-Scale Study on Video Action Dataset Condensation
Yang Chen
Sheng Guo
Bo Zheng
Limin Wang
DD
114
2
0
13 Mar 2025
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Wanhua Li
Renping Zhou
Jiawei Zhou
Yingwei Song
Johannes Herter
Minghan Qin
Gao Huang
Hanspeter Pfister
3DGS
VLM
77
1
0
13 Mar 2025
Bayesian Test-Time Adaptation for Vision-Language Models
Lihua Zhou
Mao Ye
Shuaifeng Li
Nianxin Li
Xiatian Zhu
Lei Deng
Hongbin Liu
Zhen Lei
BDL
VLM
TTA
124
1
0
12 Mar 2025
Video Action Differencing
James Burgess
Xiaohan Wang
Yuhui Zhang
Anita Rau
Alejandro Lozano
Lisa Dunlap
Trevor Darrell
Serena Yeung-Levy
VGen
70
0
0
10 Mar 2025
Conformal Predictions for Human Action Recognition with Vision-Language Models
Bary Tim
Fuchs Clément
Macq Benoît
VLM
95
0
0
10 Feb 2025
A Physical Coherence Benchmark for Evaluating Video Generation Models via Optical Flow-guided Frame Prediction
Yongfan Chen
Xiuwen Zhu
Tianyu Li
EGVM
VGen
78
3
0
08 Feb 2025
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Marco Mistretta
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
Andrew D. Bagdanov
CLIP
VLM
128
4
0
06 Feb 2025
Technical report on label-informed logit redistribution for better domain generalization in low-shot classification with foundation models
Behraj Khan
T. Syed
368
1
0
29 Jan 2025
Can Pose Transfer Models Generate Realistic Human Motion?
Vaclav Knapp
Matyas Bohacek
320
0
0
28 Jan 2025
Can masking background and object reduce static bias for zero-shot action recognition?
Takumi Fukuzawa
Kensho Hara
Hirokatsu Kataoka
Toru Tamaki
82
1
0
22 Jan 2025
Ditto: Accelerating Diffusion Model via Temporal Value Similarity
Sungbin Kim
Hyunwuk Lee
Wonho Cho
Mincheol Park
Won Woo Ro
79
1
0
20 Jan 2025
MetaNeRV: Meta Neural Representations for Videos with Spatial-Temporal Guidance
Jialong Guo
Ke Liu
Jiangchao Yao
Zhihua Wang
Jiajun Bu
Haishuai Wang
AI4TS
73
1
0
20 Jan 2025
ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models
Yassir Bendou
Amine Ouasfi
Vincent Gripon
A. Boukhayma
VLM
89
0
0
19 Jan 2025
Vitron: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
Hao Fei
Shengqiong Wu
Hao Zhang
Tat-Seng Chua
Shuicheng Yan
100
40
0
31 Dec 2024
Grid Diffusion Models for Text-to-Video Generation
Taegyeong Lee
Soyeong Kwon
Taehwan Kim
89
5
0
31 Dec 2024
VidTwin: Video VAE with Decoupled Structure and Dynamics
Yuchi Wang
Junliang Guo
Xinyi Xie
Tianyu He
Xu Sun
Li Zhao
DRL
VGen
95
3
0
23 Dec 2024
Adapting Image-to-Video Diffusion Models for Large-Motion Frame Interpolation
Luoxu Jin
Hiroshi Watanabe
DiffM
VGen
151
0
0
22 Dec 2024
UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning
Long Zhou
Fereshteh Shakeri
Aymen Sadraoui
Mounir Kaaniche
J. Pesquet
Ismail Ben Ayed
VLM
123
0
0
21 Dec 2024
Parallelized Autoregressive Visual Generation
Yanjie Wang
Shuhuai Ren
Zhijie Lin
Yujin Han
Haoyuan Guo
Zhenheng Yang
Difan Zou
Jiashi Feng
Xihui Liu
VGen
116
12
0
19 Dec 2024
Do Language Models Understand Time?
Xi Ding
Lei Wang
214
1
0
18 Dec 2024
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
94
0
0
18 Dec 2024
Empowering the Deaf and Hard of Hearing Community: Enhancing Video Captions Using Large Language Models
Nadeen Fathallah
Monika Bhole
Steffen Staab
100
0
0
30 Nov 2024
CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections
Mohamed Fazli Mohamed Imam
Rufael Fedaku Marew
Jameel Hassan
Mustansar Fiaz
Alham Fikri Aji
Hisham Cholakkal
VLM
390
0
0
28 Nov 2024
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
Yilong Wang
Zilin Gao
Qilong Wang
Zhaofeng Chen
P. Li
Q. Hu
122
1
0
28 Nov 2024
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Zongjian Li
Bin Lin
Yang Ye
Liuhan Chen
Xinhua Cheng
Shenghai Yuan
Li-xin Yuan
VGen
DiffM
122
18
0
26 Nov 2024
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
Kaifeng Gao
Jiaxin Shi
Hanwang Zhang
Chunping Wang
Jun Xiao
Long Chen
VGen
DiffM
134
1
0
25 Nov 2024
Adversarial Prompt Distillation for Vision-Language Models
Lin Luo
Xin Wang
Bojia Zi
Shihao Zhao
Xingjun Ma
Yu-Gang Jiang
AAML
VLM
103
3
0
22 Nov 2024
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao
Gen Li
Shreyank N. Gowda
Robert B Fisher
Jonathan Huang
Anurag Arnab
Laura Sevilla-Lara
105
0
0
20 Nov 2024
1
2
Next