Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1212.0402
Cited By
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
3 December 2012
K. Soomro
Amir Zamir
M. Shah
CLIP
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild"
42 / 92 papers shown
Title
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Andong Deng
Tongjia Chen
Shoubin Yu
Taojiannan Yang
Lincoln Spencer
Yapeng Tian
Ajmal Mian
Joey Tianyi Zhou
Chen Chen
LRM
78
2
0
15 Nov 2024
Spider: Any-to-Many Multimodal LLM
Jinxiang Lai
Jie Zhang
Jun Liu
Jian Li
Xiaocheng Lu
Song Guo
MLLM
96
2
0
14 Nov 2024
Investigating Memorization in Video Diffusion Models
Chong Chen
Enhuai Liu
Daochang Liu
M. Shah
Chang Xu
VGen
DiffM
93
1
0
29 Oct 2024
Tree of Attributes Prompt Learning for Vision-Language Models
Tong Ding
Wanhua Li
Zhongqi Miao
Hanspeter Pfister
VLM
83
1
0
15 Oct 2024
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Muhammad Jehanzeb Mirza
Mengjie Zhao
Zhuoyuan Mao
Sivan Doveh
Wei Lin
...
Yuki Mitsufuji
Horst Possegger
Rogerio Feris
Leonid Karlinsky
James Glass
VLM
116
1
0
08 Oct 2024
Restructuring Vector Quantization with the Rotation Trick
Christopher Fifty
Ronald G. Junkins
Dennis Duan
Aniketh Iger
Jerry W. Liu
Ehsan Amid
Sebastian Thrun
Christopher Ré
LLMSV
70
12
0
08 Oct 2024
Understanding and Mitigating Miscalibration in Prompt Tuning for Vision-Language Models
Shuoyuan Wang
Yixuan Li
Hongxin Wei
VLM
70
2
0
03 Oct 2024
Dynamic Sparse Training versus Dense Training: The Unexpected Winner in Image Corruption Robustness
Boqian Wu
Q. Xiao
Shunxin Wang
N. Strisciuglio
Mykola Pechenizkiy
M. V. Keulen
Decebal Constantin Mocanu
Elena Mocanu
OOD
3DH
110
2
0
03 Oct 2024
EventHallusion: Diagnosing Event Hallucinations in Video LLMs
Jiacheng Zhang
Yang Jiao
Shaoxiang Chen
Jingjing Chen
Zhiyu Tan
Hao Li
Jingjing Chen
MLLM
75
18
0
25 Sep 2024
Exploring Information-Theoretic Metrics Associated with Neural Collapse in Supervised Training
Kun Song
Zhiquan Tan
Bochao Zou
Jiansheng Chen
Huimin Ma
Weiran Huang
64
1
0
25 Sep 2024
Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation
Zhuoyan Luo
Fengyuan Shi
Yixiao Ge
Yujiu Yang
Limin Wang
Ying Shan
VLM
72
54
0
06 Sep 2024
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
96
8
0
13 Aug 2024
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
Tz-Ying Wu
Kyle Min
Subarna Tripathi
Nuno Vasconcelos
EgoV
99
0
0
28 Jul 2024
A Comprehensive Review of Few-shot Action Recognition
Yuyang Wanyan
Xiaoshan Yang
Weiming Dong
Changsheng Xu
VLM
114
3
0
20 Jul 2024
MMAD: Multi-label Micro-Action Detection in Videos
Kun Li
Pengyu Liu
Pengyu Liu
Guoliang Chen
Zhiliang Wu
Hehe Fan
Meng Wang
78
7
0
07 Jul 2024
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation
Kepan Nan
Rui Xie
Penghao Zhou
Tiehan Fan
Zhenheng Yang
Zhijie Chen
Xiang Li
Jian Yang
Ying Tai
111
76
0
02 Jul 2024
FRAG: Frequency Adapting Group for Diffusion Video Editing
Sunjae Yoon
Gwanhyeong Koo
Geonwoo Kim
Chang D. Yoo
DiffM
56
5
0
10 Jun 2024
ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation
Tianchen Zhao
Tongcheng Fang
Haofeng Huang
Enshu Liu
Widyadewi Soedarmadji
...
Shengen Yan
Huazhong Yang
Xuefei Ning
Xuefei Ning
Yu Wang
MQ
VGen
128
29
0
04 Jun 2024
Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding
Rong Gao
Xin Liu
Bohao Xing
Zitong Yu
Björn W. Schuller
Heikki Kälviäinen
99
3
0
21 May 2024
Contextual Emotion Recognition using Large Vision Language Models
Yasaman Etesam
Özge Nilay Yalçin
Chuxuan Zhang
Angelica Lim
VLM
84
3
0
14 May 2024
MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition
Hongyu Qu
Rui Yan
Xiangbo Shu
Haoliang Gao
Peng Huang
Guo-Sen Xie
81
4
0
03 May 2024
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Samuel Lavoie
Polina Kirichenko
Mark Ibrahim
Mahmoud Assran
Andrew Gordon Wilson
Aaron Courville
Nicolas Ballas
CLIP
VLM
84
23
0
30 Apr 2024
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey
Yi Xin
Jianjiang Yang
Haodi Zhou
Junlong Du
Junlong Du
Yue Fan
Qing Li
Qing Li
Yuntao Du
VLM
82
79
0
03 Feb 2024
Collaboratively Self-supervised Video Representation Learning for Action Recognition
Jie Zhang
Zhifan Wan
Lanqing Hu
Stephen Lin
Shuzhe Wu
Shiguang Shan
TTA
81
1
0
15 Jan 2024
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Arun V. Reddy
William Paul
Corban Rivera
Ketul Shah
Celso M. de Melo
Rama Chellappa
60
4
0
05 Dec 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
Pietro Mascagni
Pietro Mascagni
N. Padoy
Nicolas Padoy
57
22
0
27 Jul 2023
Learning without Forgetting for Vision-Language Models
Da-Wei Zhou
Yuanhan Zhang
Jingyi Ning
Jingyi Ning
De-Chuan Zhan
De-Chuan Zhan
Ziwei Liu
VLM
CLL
100
41
0
30 May 2023
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Peng Gao
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Yongfeng Zhang
Hongsheng Li
Yu Qiao
VLM
CLIP
156
1,011
0
09 Oct 2021
ConvGRU in Fine-grained Pitching Action Recognition for Action Outcome Prediction
Tianqi Ma
Lin Zhang
Xiumin Diao
Ou Ma
25
3
0
18 Aug 2020
Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework
Li Tao
Xueting Wang
T. Yamasaki
SSL
46
106
0
06 Aug 2020
Labelling unlabelled videos from scratch with multi-modal self-supervision
Yuki M. Asano
Mandela Patrick
Christian Rupprecht
Andrea Vedaldi
SSL
40
152
0
24 Jun 2020
Multiple Video Frame Interpolation via Enhanced Deformable Separable Convolution
Xianhang Cheng
Zhenzhong Chen
100
131
0
15 Jun 2020
IMUTube: Automatic Extraction of Virtual on-body Accelerometry from Video for Human Activity Recognition
Hyeokhyen Kwon
C. Tong
H. Haresamudram
Yan Gao
G. Abowd
Nicholas D. Lane
Thomas Ploetz
52
83
0
29 May 2020
Guided Weak Supervision for Action Recognition with Scarce Data to Assess Skills of Children with Autism
Prashant Pandey
P. PrathoshA.
Manu Kohli
Joshua K. Pritchard
49
33
0
11 Nov 2019
Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding
Mathew Monfort
Bowen Pan
K. Ramakrishnan
A. Andonian
Barry A. McNamara
A. Lascelles
Quanfu Fan
Dan Gutfreund
Rogerio Feris
A. Oliva
VLM
48
68
0
01 Nov 2019
Discovering Spatio-Temporal Action Tubes
Yuancheng Ye
Xiaodong Yang
Yingli Tian
42
14
0
29 Nov 2018
Adaptive Detrending to Accelerate Convolutional Gated Recurrent Unit Training for Contextual Video Recognition
Minju Jung
Haanvid Lee
Jun Tani
AI4TS
40
42
0
24 May 2017
Action Tubelet Detector for Spatio-Temporal Action Localization
Vicky Kalogeiton
Philippe Weinzaepfel
V. Ferrari
Cordelia Schmid
52
324
0
04 May 2017
Transformation-Based Models of Video Sequences
Joost R. van Amersfoort
A. Kannan
MarcÁurelio Ranzato
Arthur Szlam
Du Tran
Soumith Chintala
ViT
36
76
0
29 Jan 2017
Deep Motion Features for Visual Tracking
Susanna Gladh
Martin Danelljan
Fahad Shahbaz Khan
Michael Felsberg
50
89
0
20 Dec 2016
Asynchronous Temporal Fields for Action Recognition
Gunnar Sigurdsson
S. Divvala
Ali Farhadi
Abhinav Gupta
BDL
50
170
0
19 Dec 2016
DAP3D-Net: Where, What and How Actions Occur in Videos?
Li Liu
Yi Zhou
Ling Shao
37
14
0
10 Feb 2016
Previous
1
2