ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.06950
  4. Cited By
The Kinetics Human Action Video Dataset

The Kinetics Human Action Video Dataset

19 May 2017
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
Sudheendra Vijayanarasimhan
Fabio Viola
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "The Kinetics Human Action Video Dataset"

50 / 2,151 papers shown
Title
HSG-12M: A Large-Scale Spatial Multigraph Dataset
Xianquan Yan
Hakan Akgün
Kenji Kawaguchi
N. Duane Loh
Ching Hua Lee
176
1
0
10 Jun 2025
Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding
Super Encoding Network: Recursive Association of Multi-Modal Encoders for Video Understanding
Boyu Chen
Siran Chen
Kunchang Li
Qinglin Xu
Yu Qiao
Yali Wang
VOS
338
7
0
09 Jun 2025
Sleep Stage Classification using Multimodal Embedding Fusion from EOG and PSM
Sleep Stage Classification using Multimodal Embedding Fusion from EOG and PSM
Olivier Papillon
Rafik Goubran
James Green
Julien Larivière-Chartier
Caitlin Higginson
Frank Knoefel
Rébecca Robillard
162
0
0
07 Jun 2025
Dream to Generalize: Zero-Shot Model-Based Reinforcement Learning for Unseen Visual Distractions
Dream to Generalize: Zero-Shot Model-Based Reinforcement Learning for Unseen Visual DistractionsAAAI Conference on Artificial Intelligence (AAAI), 2023
Jeongsoo Ha
Kyungsoo Kim
Yusung Kim
OffRLVLM
164
8
0
05 Jun 2025
Towards Holistic Visual Quality Assessment of AI-Generated Videos: A LLM-Based Multi-Dimensional Evaluation Model
Towards Holistic Visual Quality Assessment of AI-Generated Videos: A LLM-Based Multi-Dimensional Evaluation Model
Zelu Qi
Ping Shi
C. Zhang
Shuqi Wang
F. Zhao
Da Pan
Zefeng Ying
EGVMVGen
335
1
0
05 Jun 2025
Fine-Tuning Video Transformers for Word-Level Bangla Sign Language: A Comparative Analysis for Classification Tasks
Jubayer Ahmed Bhuiyan Shawon
H. Mahmud
Kamrul Hasan
124
0
0
04 Jun 2025
Video, How Do Your Tokens Merge?
Video, How Do Your Tokens Merge?
Sam Pollard
Michael Wray
ViTMoMe
241
1
0
04 Jun 2025
Large-scale Self-supervised Video Foundation Model for Intelligent Surgery
Large-scale Self-supervised Video Foundation Model for Intelligent Surgery
Shu Yang
F. Zhou
Leon D. Mayer
Fuxiang Huang
Yiliang Chen
...
Zheng Li
Jing Qin
J. Teoh
Lena Maier-Hein
Hao-tao Chen
231
3
0
03 Jun 2025
HRTR: A Single-stage Transformer for Fine-grained Sub-second Action Segmentation in Stroke Rehabilitation
HRTR: A Single-stage Transformer for Fine-grained Sub-second Action Segmentation in Stroke Rehabilitation
H. Helvaci
Justin Philip Huber
Jihye Bae
S. Cheung
174
0
0
03 Jun 2025
Fire360: A Benchmark for Robust Perception and Episodic Memory in Degraded 360-Degree Firefighting Videos
Fire360: A Benchmark for Robust Perception and Episodic Memory in Degraded 360-Degree Firefighting Videos
Aditi Tiwari
Farzaneh Masoud
Dac Trong Nguyen
Jill Kraft
Heng Ji
Klara Nahrstedt
163
0
0
02 Jun 2025
SemiVT-Surge: Semi-Supervised Video Transformer for Surgical Phase Recognition
SemiVT-Surge: Semi-Supervised Video Transformer for Surgical Phase RecognitionInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Yiping Li
Ronald L.P.D. de Jong
Sahar Nasirihaghighi
Tim J. M. Jaspers
Romy van Jaarsveld
...
Richard van Hillegersberg
Fons van der Sommen
J P Ruurda
M. Breeuwer
Yasmina al Khalil
MedIm
192
3
0
02 Jun 2025
Improving Keystep Recognition in Ego-Video via Dexterous Focus
Improving Keystep Recognition in Ego-Video via Dexterous Focus
Zachary Chavis
Stephen J. Guy
Hyun Soo Park
252
1
0
01 Jun 2025
$\texttt{AVROBUSTBENCH}$: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time
AVROBUSTBENCH\texttt{AVROBUSTBENCH}AVROBUSTBENCH: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time
Sarthak Kumar Maharana
Saksham Singh Kushwaha
Baoming Zhang
Adrian Rodriguez
Songtao Wei
Yapeng Tian
Yunhui Guo
TTAVLM
257
0
0
31 May 2025
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object Trajectory
Chenhao Zheng
Jieyu Zhang
Mohammadreza Salehi
Ziqi Gao
Vishnu Iyengar
Norimasa Kobori
Quan Kong
Ranjay Krishna
355
2
0
29 May 2025
Multimodal Federated Learning: A Survey through the Lens of Different FL Paradigms
Multimodal Federated Learning: A Survey through the Lens of Different FL Paradigms
Yuanzhe Peng
Jieming Bian
Lei Wang
Yin Huang
Jie Xu
203
0
0
27 May 2025
VideoMarkBench: Benchmarking Robustness of Video Watermarking
VideoMarkBench: Benchmarking Robustness of Video Watermarking
Zhengyuan Jiang
Moyang Guo
Kecen Li
Yuepeng Hu
Yupu Wang
Zhicong Huang
Cheng Hong
Neil Zhenqiang Gong
AAML
196
0
0
27 May 2025
CA3D: Convolutional-Attentional 3D Nets for Efficient Video Activity Recognition on the Edge
CA3D: Convolutional-Attentional 3D Nets for Efficient Video Activity Recognition on the Edge
Gabriele Lagani
Fabrizio Falchi
Claudio Gennaro
Giuseppe Amato
150
1
0
26 May 2025
TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs
TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs
Juntong Wang
Jiarui Wang
Huiyu Duan
Guangtao Zhai
Xiongkuo Min
173
6
0
26 May 2025
The Role of Video Generation in Enhancing Data-Limited Action Understanding
The Role of Video Generation in Enhancing Data-Limited Action UnderstandingInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Wei Li
Dezhao Luo
Dongbao Yang
Zhenhang Li
Weiping Wang
Can Ma
DiffMVGen
585
0
0
26 May 2025
Inference Compute-Optimal Video Vision Language Models
Inference Compute-Optimal Video Vision Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Peiqi Wang
ShengYun Peng
Xuewen Zhang
Hanchao Yu
Yibo Yang
Lifu Huang
Fujun Liu
Qifan Wang
VLM
271
2
0
24 May 2025
SHARDeg: A Benchmark for Skeletal Human Action Recognition in Degraded Scenarios
SHARDeg: A Benchmark for Skeletal Human Action Recognition in Degraded Scenarios
Simon Malzard
Nitish Mital
Richard Walters
Victoria Nockles
Raghuveer Rao
Celso M. De Melo
3DH
382
0
0
23 May 2025
Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection
Dual Branch VideoMamba with Gated Class Token Fusion for Violence Detection
Damith Chamalke Senadeera
Xiaoyun Yang
Shibo Li
Muhammad Awais
Dimitrios Kollias
Gregory G. Slabaugh
Mamba
177
1
0
23 May 2025
Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation
Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation
Moru Liu
Hao Dong
Jessica Kelly
Olga Fink
Mario Trapp
OODD
298
3
0
22 May 2025
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
Zihua Wang
Ruibo Li
Haozhe Du
Joey Tianyi Zhou
Yu Zhang
Xu Yang
MLLM
381
1
0
19 May 2025
Just Dance with $π$! A Poly-modal Inductor for Weakly-supervised Video Anomaly Detection
Just Dance with πππ! A Poly-modal Inductor for Weakly-supervised Video Anomaly DetectionComputer Vision and Pattern Recognition (CVPR), 2025
Snehashis Majhi
Giacomo DÁmicantonio
A. Dantcheva
Quan Kong
Lorenzo Garattoni
Gianpiero Francesca
Egor Bondarev
Francois Bremond
181
0
0
19 May 2025
GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation
GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation
Teli Ma
Jia Zheng
Zifan Wang
Ziyao Gao
Jiaming Zhou
Junwei Liang
279
6
0
17 May 2025
Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models
Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models
Keunwoo Peter Yu
Joyce Chai
MLLMVLM
259
0
0
16 May 2025
A Fourier Space Perspective on Diffusion Models
A Fourier Space Perspective on Diffusion Models
Fabian Falck
Teodora Pandeva
Kiarash Zahirnia
Rachel Lawrence
Richard Turner
Edward Meeds
Javier Zazo
Sushrut Karmalkar
DiffMMedIm
244
12
0
16 May 2025
SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and $\mathcal{O}(T)$ Complexity
SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and O(T)\mathcal{O}(T)O(T) Complexity
Shihao Zou
Qingfeng Li
Wei Ji
Jingjing Li
Yongkui Yang
Guoqi Li
Chao Dong
330
1
0
15 May 2025
Incorporating brain-inspired mechanisms for multimodal learning in artificial intelligence
Xiang He
Dongcheng Zhao
Yang Li
Qingqun Kong
Xin Yang
Yi Zeng
286
0
0
15 May 2025
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video ParsingComputer Vision and Pattern Recognition (CVPR), 2025
Yung-Hsuan Lai
Janek Ebbers
Yu-Chiang Frank Wang
François Germain
Michael Jeffrey Jones
Moitreya Chatterjee
202
1
0
14 May 2025
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Ayush K. Rai
Kyle Min
Tarun Krishna
Feiyan Hu
Alan F. Smeaton
Noel E. O'Connor
VGen
322
0
0
13 May 2025
Video Dataset Condensation with Diffusion Models
Video Dataset Condensation with Diffusion Models
Zhe Li
Hadrien Reynaud
Mischa Dombrowski
Sarah Cechnicka
Franciskus Xaverius Erick
Bernhard Kainz
DDVGen
489
1
0
10 May 2025
Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study
Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study
Tamim Ahmed
Thanassis Rikakis
155
0
0
03 May 2025
Vehicular Communication Security: Multi-Channel and Multi-Factor Authentication
Vehicular Communication Security: Multi-Channel and Multi-Factor AuthenticationIEEE Transactions on Vehicular Technology (IEEE Trans. Veh. Technol.), 2025
Marco De Vincenzi
Siyang Song
Chen Bo Calvin Zhang
Manuel Garcia
Shaozu Ding
Chiara Bodei
Ilaria Matteucci
Dajiang Suo
Dajiang Suo
349
1
0
01 May 2025
CoCoDiff: Diversifying Skeleton Action Features via Coarse-Fine Text-Co-Guided Latent Diffusion
CoCoDiff: Diversifying Skeleton Action Features via Coarse-Fine Text-Co-Guided Latent Diffusion
Zhifu Zhao
Hanyang Hua
Jiajian Li
Shaoxin Wu
Fu Li
Yangtao Zhou
Yang Li
DiffM
308
1
0
30 Apr 2025
MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment
MVQA: Mamba with Unified Sampling for Efficient Video Quality Assessment
Yachun Mi
Yu Li
Weicheng Meng
Chong Chen
Chen Hui
Gangyan Zeng
272
1
0
22 Apr 2025
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Audio-Visual Class-Incremental Learning for Fish Feeding intensity Assessment in Aquaculture
Meng Cui
Xianghu Yue
Xinyuan Qian
Jinzheng Zhao
Haohe Liu
Xubo Liu
Daoliang Li
Wenwu Wang
327
1
0
21 Apr 2025
Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer
Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormerComputer Vision and Pattern Recognition (CVPR), 2025
Ziyi Liu
Wenshu Fan
182
3
0
21 Apr 2025
PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition
PCBEAR: Pose Concept Bottleneck for Explainable Action Recognition
Jongseo Lee
Wooil Lee
Gyeong-Moon Park
Seong Tae Kim
Jinwoo Choi
375
1
0
17 Apr 2025
SkeletonX: Data-Efficient Skeleton-based Action Recognition via Cross-sample Feature Aggregation
SkeletonX: Data-Efficient Skeleton-based Action Recognition via Cross-sample Feature AggregationIEEE transactions on multimedia (TMM), 2025
Zongye Zhang
Wenrui Cai
Qingjie Liu
Yanjie Wang
266
0
0
16 Apr 2025
Co-STAR: Collaborative Curriculum Self-Training with Adaptive Regularization for Source-Free Video Domain Adaptation
Co-STAR: Collaborative Curriculum Self-Training with Adaptive Regularization for Source-Free Video Domain Adaptation
Amirhossein Dadashzadeh
Parsa Esmati
Majid Mirmehdi
TTAVLM
354
1
0
15 Apr 2025
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Haoran Hao
Jiaming Han
Yiyuan Zhang
Xiangyu Yue
446
0
0
14 Apr 2025
F$^3$Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos
F3^33Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from VideosInternational Conference on Learning Representations (ICLR), 2025
Zhaoyu Liu
Kan Jiang
Murong Ma
Zhe Hou
Yun Lin
Jin Song Dong
264
3
0
11 Apr 2025
Exploring Ordinal Bias in Action Recognition for Instructional Videos
Exploring Ordinal Bias in Action Recognition for Instructional Videos
Joochan Kim
Minjoon Jung
Byoung-Tak Zhang
203
0
0
09 Apr 2025
RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
RAGME: Retrieval Augmented Video Generation for Enhanced Motion RealismInternational Conference on Multimedia Retrieval (ICMR), 2025
E. Peruzzo
Dejia Xu
Xingqian Xu
Humphrey Shi
Andrii Zadaianchuk
DiffMVGen
295
2
0
09 Apr 2025
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Piyush Bagad
Hazel Doughty
Bernard Ghanem
Cees G. M. Snoek
ViTSSL
299
0
0
08 Apr 2025
Studying Image Diffusion Features for Zero-Shot Video Object Segmentation
Studying Image Diffusion Features for Zero-Shot Video Object Segmentation
Thanos Delatolas
Vicky S. Kalogeiton
Dim P. Papadopoulos
DiffMVOS
321
3
0
07 Apr 2025
AsyReC: A Multimodal Graph-based Framework for Spatio-Temporal Asymmetric Dyadic Relationship Classification
AsyReC: A Multimodal Graph-based Framework for Spatio-Temporal Asymmetric Dyadic Relationship Classification
Wang Tang
Fethiye Irmak Dogan
Linbo Qing
Hatice Gunes
206
2
0
07 Apr 2025
Video-Bench: Human-Aligned Video Generation Benchmark
Video-Bench: Human-Aligned Video Generation BenchmarkComputer Vision and Pattern Recognition (CVPR), 2025
Hui Han
Siyuan Li
Jiaqi Chen
Yiwen Yuan
Yuling Wu
...
You Li
Jing Zhang
Chi Zhang
Li Li
Yongxin Ni
EGVMVGen
548
10
0
07 Apr 2025
Previous
123456...424344
Next