Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.06950
Cited By
The Kinetics Human Action Video Dataset
19 May 2017
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
Sudheendra Vijayanarasimhan
Fabio Viola
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Kinetics Human Action Video Dataset"
50 / 2,017 papers shown
Title
DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition
Haozhe Cheng
Chen Ju
Haicheng Wang
Jinxiang Liu
Mengting Chen
Qiang Hu
Xiaoyun Zhang
Yanfeng Wang
DiffM
VLM
45
5
0
23 Apr 2024
Latency-Distortion Tradeoffs in Communicating Classification Results over Noisy Channels
N. Teku
Sudarshan Adiga
Ravi Tandon
53
0
0
22 Apr 2024
TAVGBench: Benchmarking Text to Audible-Video Generation
Yuxin Mao
Xuyang Shen
Jing Zhang
Zhen Qin
Jinxing Zhou
Mochu Xiang
Yiran Zhong
Yuchao Dai
48
11
0
22 Apr 2024
STAT: Towards Generalizable Temporal Action Localization
Yangcen Liu
Ziyi Liu
Yuanhao Zhai
Wen Li
David Doerman
Junsong Yuan
36
2
0
20 Apr 2024
PhysDreamer: Physics-Based Interaction with 3D Objects via Video Generation
Tianyuan Zhang
Hong-Xing Yu
Rundi Wu
Brandon Yushan Feng
Changxi Zheng
Noah Snavely
Jiajun Wu
William T. Freeman
AI4CE
VGen
84
62
0
19 Apr 2024
Aligning Actions and Walking to LLM-Generated Textual Descriptions
Radu Chivereanu
Adrian Cosma
Andy Catruna
R. Rughinis
I. Radoi
57
2
0
18 Apr 2024
Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition
Xunsong Li
Pengzhan Sun
Yangcen Liu
Lixin Duan
Wen Li
50
3
0
18 Apr 2024
Multi-Task Multi-Modal Self-Supervised Learning for Facial Expression Recognition
Marah Halawa
Florian Blume
Pia Bideau
Martin Maier
Rasha Abdel Rahman
Olaf Hellwich
CVBM
41
1
0
16 Apr 2024
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Amir Bar
Arya Bakhtiar
Danny Tran
Antonio Loquercio
Jathushan Rajasegaran
Yann LeCun
Amir Globerson
Trevor Darrell
EgoV
46
4
0
15 Apr 2024
Design and Analysis of Efficient Attention in Transformers for Social Group Activity Recognition
Masato Tamura
29
2
0
15 Apr 2024
Learning Tracking Representations from Single Point Annotations
Qiangqiang Wu
Antoni B. Chan
33
1
0
15 Apr 2024
Leveraging Temporal Contextualization for Video Action Recognition
Minji Kim
Dongyoon Han
Taekyung Kim
Bohyung Han
55
2
0
15 Apr 2024
Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection
Jin Yang
Ping Wei
Huan Li
Ziyang Ren
56
8
0
14 Apr 2024
Exploring Explainability in Video Action Recognition
Avinab Saha
Shashank Gupta
S. Ankireddy
Karl Chahine
Joydeep Ghosh
37
0
0
13 Apr 2024
ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition
Otto Brookes
Majid Mirmehdi
H. Kühl
T. Burghardt
35
3
0
13 Apr 2024
Multimodal Attack Detection for Action Recognition Models
Furkan Mumcu
Yasin Yılmaz
AAML
33
1
0
13 Apr 2024
ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos
Sharana Dharshikgan Suresh Dass
H. Barua
Ganesh Krishnasamy
Raveendran Paramesran
Raphael C.-W. Phan
ViT
51
2
0
09 Apr 2024
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk
Jaesung Huh
Evangelos Kazakos
Andrew Zisserman
Dima Damen
48
9
0
08 Apr 2024
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
Yingsen Zeng
Yujie Zhong
Chengjian Feng
Lin Ma
68
7
0
07 Apr 2024
Study of the effect of Sharpness on Blind Video Quality Assessment
Anantha Prabhu
David Pratap
Narayana Darapeni
R. AnweshP
28
0
0
06 Apr 2024
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu
Runyu He
Gangshan Wu
Limin Wang
3DH
59
3
0
06 Apr 2024
Learning Correlation Structures for Vision Transformers
Manjin Kim
Paul Hongsuck Seo
Cordelia Schmid
Minsu Cho
ViT
42
7
0
05 Apr 2024
SalFoM: Dynamic Saliency Prediction with Video Foundation Models
Morteza Moradi
Mohammad Moradi
Francesco Rundo
C. Spampinato
Ali Borji
S. Palazzo
44
1
0
03 Apr 2024
SnAG: Scalable and Accurate Video Grounding
Fangzhou Mu
Sicheng Mo
Yin Li
47
9
0
02 Apr 2024
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
Xu He
Qiaochu Huang
Zhensong Zhang
Zhiwei Lin
Zhiyong Wu
Sicheng Yang
Minglei Li
Zhiyi Chen
Songcen Xu
Xiaofei Wu
35
15
0
02 Apr 2024
360+x: A Panoptic Multi-modal Scene Understanding Dataset
Hao Chen
Yuqi Hou
Chenyuan Qu
Irene Testini
Xiaohan Hong
Jianbo Jiao
37
7
0
01 Apr 2024
ST-LLM: Large Language Models Are Effective Temporal Learners
Ruyang Liu
Chen Li
Haoran Tang
Yixiao Ge
Ying Shan
Ge Li
54
70
0
30 Mar 2024
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva
Fadime Sener
Edoardo Remelli
Bugra Tekin
Eric Sauser
Bernt Schiele
Shugao Ma
VLM
EgoV
45
1
0
28 Mar 2024
Frame by Familiar Frame: Understanding Replication in Video Diffusion Models
Aimon Rahman
Malsha V. Perera
Vishal M. Patel
VGen
56
7
0
28 Mar 2024
PLOT-TAL -- Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization
Edward Fish
Jon Weinbren
Andrew Gilbert
57
1
0
27 Mar 2024
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
79
14
0
26 Mar 2024
Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
Alexandre Eymaël
Renaud Vandeghen
A. Cioppa
Silvio Giancola
Guohao Li
Marc Van Droogenbroeck
ViT
48
6
0
26 Mar 2024
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Tanvir Mahmud
Burhaneddin Yaman
Chun-Hao Liu
Diana Marculescu
43
2
0
24 Mar 2024
Edit3K: Universal Representation Learning for Video Editing Components
Xin Gu
Libo Zhang
Fan Chen
Longyin Wen
Yufei Wang
Tiejian Luo
Sijie Zhu
48
4
0
24 Mar 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
150
319
0
21 Mar 2024
MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection
Jakub Micorek
Horst Possegger
Dominik Narnhofer
Horst Bischof
Mateusz Koziñski
18
6
0
21 Mar 2024
Selective, Interpretable, and Motion Consistent Privacy Attribute Obfuscation for Action Recognition
Filip Ilic
Henghui Zhao
Thomas Pock
Richard P. Wildes
PICV
AAML
44
2
0
19 Mar 2024
Dynamic Spatial-Temporal Aggregation for Skeleton-Aware Sign Language Recognition
Lianyu Hu
Liqing Gao
Zekang Liu
Wei Feng
SLR
42
1
0
19 Mar 2024
VideoBadminton: A Video Dataset for Badminton Action Recognition
Qi Li
Tzu-Chen Chiu
Hsiang-Wei Huang
Minmin Sun
Wei-Shinn Ku
36
3
0
19 Mar 2024
CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner
Tingbing Yan
Wenzheng Zeng
Yang Xiao
Xingyu Tong
Bo Tan
Zhiwen Fang
Zhiguo Cao
Qiufeng Wang
36
5
0
15 Mar 2024
AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors
Yucen Wang
Shenghua Wan
Le Gan
Shuai Feng
De-Chuan Zhan
VGen
27
4
0
15 Mar 2024
Generalized Predictive Model for Autonomous Driving
Jiazhi Yang
Shenyuan Gao
Yihang Qiu
Li Chen
Tianyu Li
...
Ping Luo
Jun Zhang
Andreas Geiger
Yu Qiao
Hongyang Li
VGen
73
58
0
14 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
64
73
0
14 Mar 2024
SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition
Jeonghyeok Do
Munchurl Kim
ViT
48
18
0
14 Mar 2024
Don't Judge by the Look: Towards Motion Coherent Video Representation
Yitian Zhang
Yue Bai
Huan Wang
Yizhou Wang
Yun Fu
42
0
0
14 Mar 2024
EquiAV: Leveraging Equivariance for Audio-Visual Contrastive Learning
Jongsuk Kim
Hyeongkeun Lee
Kyeongha Rho
Junmo Kim
Joon Son Chung
36
4
0
14 Mar 2024
Spatiotemporal Representation Learning for Short and Long Medical Image Time Series
Chengzhi Shen
M. Menten
Hrvoje Bogunović
U. Schmidt-Erfurth
H. Scholl
S. Sivaprasad
A. Lotery
Daniel Rueckert
Paul Hager
Robbie Holland
42
2
0
12 Mar 2024
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
42
184
0
11 Mar 2024
Density-Guided Label Smoothing for Temporal Localization of Driving Actions
Tunç Alkanat
Erkut Akdag
Egor Bondarev
Peter H. N. de With
43
4
0
11 Mar 2024
Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition
Erkut Akdag
Zeqi Zhu
Egor Bondarev
Peter H. N. de With
ViT
42
5
0
11 Mar 2024
Previous
1
2
3
...
6
7
8
...
39
40
41
Next