Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1705.06950
Cited By
The Kinetics Human Action Video Dataset
19 May 2017
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
Sudheendra Vijayanarasimhan
Fabio Viola
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The Kinetics Human Action Video Dataset"
50 / 2,152 papers shown
An Effective End-to-End Solution for Multimodal Action Recognition
International Conference on Pattern Recognition (ICPR), 2025
Songping Wang
Xiantao Hu
Yueming Lyu
Caifeng Shan
236
2
0
11 Jun 2025
HSG-12M: A Large-Scale Spatial Multigraph Dataset
Xianquan Yan
Hakan Akgün
Kenji Kawaguchi
N. Duane Loh
Ching Hua Lee
201
1
0
10 Jun 2025
Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding
Boyu Chen
Siran Chen
Kunchang Li
Qinglin Xu
Yu Qiao
Yali Wang
VOS
382
7
0
09 Jun 2025
Sleep Stage Classification using Multimodal Embedding Fusion from EOG and PSM
Olivier Papillon
Rafik Goubran
James Green
Julien Larivière-Chartier
Caitlin Higginson
Frank Knoefel
Rébecca Robillard
187
0
0
07 Jun 2025
Dream to Generalize: Zero-Shot Model-Based Reinforcement Learning for Unseen Visual Distractions
AAAI Conference on Artificial Intelligence (AAAI), 2023
Jeongsoo Ha
Kyungsoo Kim
Yusung Kim
OffRL
VLM
177
10
0
05 Jun 2025
Towards Holistic Visual Quality Assessment of AI-Generated Videos: A LLM-Based Multi-Dimensional Evaluation Model
Zelu Qi
Ping Shi
C. Zhang
Shuqi Wang
F. Zhao
Da Pan
Zefeng Ying
EGVM
VGen
363
1
0
05 Jun 2025
Fine-Tuning Video Transformers for Word-Level Bangla Sign Language: A Comparative Analysis for Classification Tasks
Jubayer Ahmed Bhuiyan Shawon
H. Mahmud
Kamrul Hasan
152
0
0
04 Jun 2025
Video, How Do Your Tokens Merge?
Sam Pollard
Michael Wray
ViT
MoMe
270
1
0
04 Jun 2025
HRTR: A Single-stage Transformer for Fine-grained Sub-second Action Segmentation in Stroke Rehabilitation
H. Helvaci
Justin Philip Huber
Jihye Bae
S. Cheung
205
0
0
03 Jun 2025
Large-scale Self-supervised Video Foundation Model for Intelligent Surgery
Shu Yang
F. Zhou
Leon D. Mayer
Fuxiang Huang
Yiliang Chen
...
Zheng Li
Jing Qin
J. Teoh
Lena Maier-Hein
Hao-tao Chen
254
3
0
03 Jun 2025
Fire360: A Benchmark for Robust Perception and Episodic Memory in Degraded 360-Degree Firefighting Videos
Aditi Tiwari
Farzaneh Masoud
Dac Trong Nguyen
Jill Kraft
Heng Ji
Klara Nahrstedt
178
0
0
02 Jun 2025
SemiVT-Surge: Semi-Supervised Video Transformer for Surgical Phase Recognition
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Yiping Li
Ronald L.P.D. de Jong
Sahar Nasirihaghighi
Tim J. M. Jaspers
Romy van Jaarsveld
...
Richard van Hillegersberg
Fons van der Sommen
J P Ruurda
M. Breeuwer
Yasmina al Khalil
MedIm
210
3
0
02 Jun 2025
Improving Keystep Recognition in Ego-Video via Dexterous Focus
Zachary Chavis
Stephen J. Guy
Hyun Soo Park
260
1
0
01 Jun 2025
AVROBUSTBENCH
\texttt{AVROBUSTBENCH}
AVROBUSTBENCH
: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time
Sarthak Kumar Maharana
Saksham Singh Kushwaha
Baoming Zhang
Adrian Rodriguez
Songtao Wei
Yapeng Tian
Yunhui Guo
TTA
VLM
291
0
0
31 May 2025
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
Chenhao Zheng
Jieyu Zhang
Mohammadreza Salehi
Ziqi Gao
Vishnu Iyengar
Norimasa Kobori
Quan Kong
Ranjay Krishna
378
2
0
29 May 2025
Multimodal Federated Learning: A Survey through the Lens of Different FL Paradigms
Yuanzhe Peng
Jieming Bian
Lei Wang
Yin Huang
Jie Xu
208
1
0
27 May 2025
VideoMarkBench: Benchmarking Robustness of Video Watermarking
Zhengyuan Jiang
Moyang Guo
Kecen Li
Yuepeng Hu
Yupu Wang
Zhicong Huang
Cheng Hong
Neil Zhenqiang Gong
AAML
220
0
0
27 May 2025
TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs
Juntong Wang
Jiarui Wang
Huiyu Duan
Guangtao Zhai
Xiongkuo Min
182
7
0
26 May 2025
CA3D: Convolutional-Attentional 3D Nets for Efficient Video Activity Recognition on the Edge
Gabriele Lagani
Fabrizio Falchi
Claudio Gennaro
Giuseppe Amato
158
1
0
26 May 2025
The Role of Video Generation in Enhancing Data-Limited Action Understanding
International Joint Conference on Artificial Intelligence (IJCAI), 2025
Wei Li
Dezhao Luo
Dongbao Yang
Zhenhang Li
Weiping Wang
Can Ma
DiffM
VGen
614
0
0
26 May 2025
Inference Compute-Optimal Video Vision Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Peiqi Wang
ShengYun Peng
Xuewen Zhang
Hanchao Yu
Yibo Yang
Lifu Huang
Fujun Liu
Qifan Wang
VLM
277
2
0
24 May 2025
Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection
Damith Chamalke Senadeera
Xiaoyun Yang
Shibo Li
Muhammad Awais
Dimitrios Kollias
Gregory G. Slabaugh
Mamba
221
1
0
23 May 2025
SHARDeg: A Benchmark for Skeletal Human Action Recognition in Degraded Scenarios
Simon Malzard
Nitish Mital
Richard Walters
Victoria Nockles
Raghuveer Rao
Celso M. De Melo
3DH
423
0
0
23 May 2025
Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation
Moru Liu
Hao Dong
Jessica Kelly
Olga Fink
Mario Trapp
OODD
302
3
0
22 May 2025
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
Zihua Wang
Ruibo Li
Haozhe Du
Joey Tianyi Zhou
Yu Zhang
Xu Yang
MLLM
421
1
0
19 May 2025
Just Dance with
π
π
π
! A Poly-modal Inductor for Weakly-supervised Video Anomaly Detection
Computer Vision and Pattern Recognition (CVPR), 2025
Snehashis Majhi
Giacomo DÁmicantonio
A. Dantcheva
Quan Kong
Lorenzo Garattoni
Gianpiero Francesca
Egor Bondarev
Francois Bremond
198
0
0
19 May 2025
GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation
Teli Ma
Jia Zheng
Zifan Wang
Ziyao Gao
Jiaming Zhou
Junwei Liang
309
8
0
17 May 2025
A Fourier Space Perspective on Diffusion Models
Fabian Falck
Teodora Pandeva
Kiarash Zahirnia
Rachel Lawrence
Richard Turner
Edward Meeds
Javier Zazo
Sushrut Karmalkar
DiffM
MedIm
265
15
0
16 May 2025
Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models
Keunwoo Peter Yu
Joyce Chai
MLLM
VLM
289
0
0
16 May 2025
SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and
O
(
T
)
\mathcal{O}(T)
O
(
T
)
Complexity
Shihao Zou
Qingfeng Li
Wei Ji
Jingjing Li
Yongkui Yang
Guoqi Li
Chao Dong
358
1
0
15 May 2025
Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence
Xiang He
Dongcheng Zhao
Yang Li
Qingqun Kong
Xin Yang
Yi Zeng
291
0
0
15 May 2025
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
Computer Vision and Pattern Recognition (CVPR), 2025
Yung-Hsuan Lai
Janek Ebbers
Yu-Chiang Frank Wang
François Germain
Michael Jeffrey Jones
Moitreya Chatterjee
222
1
0
14 May 2025
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Ayush K. Rai
Kyle Min
Tarun Krishna
Feiyan Hu
Alan F. Smeaton
Noel E. O'Connor
VGen
350
0
0
13 May 2025
Video Dataset Condensation with Diffusion Models
Zhe Li
Hadrien Reynaud
Mischa Dombrowski
Sarah Cechnicka
Franciskus Xaverius Erick
Bernhard Kainz
DD
VGen
504
1
0
10 May 2025
Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study
Tamim Ahmed
Thanassis Rikakis
191
0
0
03 May 2025
Vehicular Communication Security: Multi-Channel and Multi-Factor Authentication
IEEE Transactions on Vehicular Technology (IEEE Trans. Veh. Technol.), 2025
Marco De Vincenzi
Siyang Song
Chen Bo Calvin Zhang
Manuel Garcia
Shaozu Ding
Chiara Bodei
Ilaria Matteucci
Dajiang Suo
Dajiang Suo
381
1
0
01 May 2025
CoCoDiff: Diversifying Skeleton Action Features via Coarse-Fine Text-Co-Guided Latent Diffusion
Zhifu Zhao
Hanyang Hua
Jiajian Li
Shaoxin Wu
Fu Li
Yangtao Zhou
Yang Li
DiffM
341
1
0
30 Apr 2025
MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment
Yachun Mi
Yu Li
Weicheng Meng
Chong Chen
Chen Hui
Gangyan Zeng
296
1
0
22 Apr 2025
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Meng Cui
Xianghu Yue
Xinyuan Qian
Jinzheng Zhao
Haohe Liu
Xubo Liu
Daoliang Li
Wenwu Wang
343
1
0
21 Apr 2025
Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer
Computer Vision and Pattern Recognition (CVPR), 2025
Ziyi Liu
Wenshu Fan
198
3
0
21 Apr 2025
PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition
Jongseo Lee
Wooil Lee
Gyeong-Moon Park
Seong Tae Kim
Jinwoo Choi
383
1
0
17 Apr 2025
SkeletonX: Data-Efficient Skeleton-based Action Recognition via Cross-sample Feature Aggregation
IEEE transactions on multimedia (TMM), 2025
Zongye Zhang
Wenrui Cai
Qingjie Liu
Yanjie Wang
286
0
0
16 Apr 2025
Co-STAR: Collaborative Curriculum Self-Training with Adaptive Regularization for Source-Free Video Domain Adaptation
Amirhossein Dadashzadeh
Parsa Esmati
Majid Mirmehdi
TTA
VLM
409
1
0
15 Apr 2025
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Haoran Hao
Jiaming Han
Yiyuan Zhang
Xiangyu Yue
495
0
0
14 Apr 2025
F
3
^3
3
Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
International Conference on Learning Representations (ICLR), 2025
Zhaoyu Liu
Kan Jiang
Murong Ma
Zhe Hou
Yun Lin
Jin Song Dong
296
3
0
11 Apr 2025
RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
International Conference on Multimedia Retrieval (ICMR), 2025
E. Peruzzo
Dejia Xu
Xingqian Xu
Humphrey Shi
Andrii Zadaianchuk
DiffM
VGen
328
2
0
09 Apr 2025
Exploring Ordinal Bias in Action Recognition for Instructional Videos
Joochan Kim
Minjoon Jung
Byoung-Tak Zhang
243
0
0
09 Apr 2025
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Piyush Bagad
Hazel Doughty
Bernard Ghanem
Cees G. M. Snoek
ViT
SSL
346
0
0
08 Apr 2025
Video-Bench: Human-Aligned Video Generation Benchmark
Computer Vision and Pattern Recognition (CVPR), 2025
Hui Han
Siyuan Li
Jiaqi Chen
Yiwen Yuan
Yuling Wu
...
You Li
Jing Zhang
Chi Zhang
Li Li
Yongxin Ni
EGVM
VGen
587
11
0
07 Apr 2025
Studying Image Diffusion Features for Zero-Shot Video Object Segmentation
Thanos Delatolas
Vicky S. Kalogeiton
Dim P. Papadopoulos
DiffM
VOS
335
3
0
07 Apr 2025
Previous
1
2
3
4
5
6
...
42
43
44
Next