ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.06950
  4. Cited By
The Kinetics Human Action Video Dataset

The Kinetics Human Action Video Dataset

19 May 2017
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
Sudheendra Vijayanarasimhan
Fabio Viola
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "The Kinetics Human Action Video Dataset"

50 / 2,152 papers shown
Janus: Collaborative Vision Transformer Under Dynamic Network Environment
Janus: Collaborative Vision Transformer Under Dynamic Network EnvironmentIEEE Conference on Computer Communications (IEEE INFOCOM), 2025
Linyi Jiang
Silvery Fu
Yifei Zhu
Bo Li
ViT
890
2
0
14 Feb 2025
Learning Human Skill Generators at Key-Step Levels
Learning Human Skill Generators at Key-Step Levels
Yilu Wu
Chenhui Zhu
Shuai Wang
Hanlin Wang
Jing Wang
Zhaoxiang Zhang
Limin Wang
VGen
392
1
0
12 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
327
0
0
11 Feb 2025
A Survey on Mamba Architecture for Vision Applications
A Survey on Mamba Architecture for Vision Applications
Fady Ibrahim
Guangjun Liu
Guanghui Wang
Mamba
432
9
0
11 Feb 2025
History-Guided Video Diffusion
History-Guided Video Diffusion
Kiwhan Song
Boyuan Chen
Max Simchowitz
Yilun Du
Russ Tedrake
Vincent Sitzmann
VGen
554
65
0
10 Feb 2025
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Yueying Zou
Peipei Li
Zekun Li
Huaibo Huang
Xing Cui
Xuannan Liu
Chenghanyu Zhang
Ran He
DeLMO
697
11
0
07 Feb 2025
BRIDLE: Generalized Self-supervised Learning with Quantization
BRIDLE: Generalized Self-supervised Learning with Quantization
Hoang M. Nguyen
Satya Narayan Shukla
Qiang Zhang
Hanchao Yu
Sreya D. Roy
Taipeng Tian
Lingjiong Zhu
Yuchen Liu
SSLMQ
329
0
0
04 Feb 2025
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Yang Cao
Zhao Song
Chiwun Yang
VGen
511
11
0
01 Feb 2025
Minimalistic Video Saliency Prediction via Efficient Decoder & Spatio Temporal Action Cues
Minimalistic Video Saliency Prediction via Efficient Decoder & Spatio Temporal Action CuesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Rohit Girmaji
Siddharth Jain
Bhav Beri
Sarthak Bansal
Vineet Gandhi
ViT
207
4
0
01 Feb 2025
Can masking background and object reduce static bias for zero-shot action recognition?
Can masking background and object reduce static bias for zero-shot action recognition?Conference on Multimedia Modeling (MMM), 2025
Takumi Fukuzawa
Kensho Hara
Hirokatsu Kataoka
Toru Tamaki
450
4
0
22 Jan 2025
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Yi Wang
Xinhao Li
Ziang Yan
Yinan He
Jiashuo Yu
...
Kai Chen
Wenhai Wang
Yu Qiao
Yali Wang
Limin Wang
543
120
0
21 Jan 2025
Human Activity Recognition in an Open World
Human Activity Recognition in an Open WorldJournal of Artificial Intelligence Research (JAIR), 2022
D. Prijatelj
Samuel Grieggs
Jin Huang
Dawei Du
Ameya Shringi
Christopher Funk
Adam Kaufman
Eric Robertson
Walter J. Scheirer University of Notre Dame
390
4
0
17 Jan 2025
A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction
A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement PredictionIEEE/ACM International Conference on Human-Robot Interaction (HRI), 2025
Naval Kishore Mehta
Arvind
Himanshu Kumar
Abeer Banerjee
Sumeet Saurav
Sanjay Singh
199
0
0
10 Jan 2025
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric VideosNeural Information Processing Systems (NeurIPS), 2024
Luigi Seminara
G. Farinella
Antonino Furnari
488
21
0
10 Jan 2025
MS-Temba: Multi-Scale Temporal Mamba for Understanding Long Untrimmed Videos
MS-Temba: Multi-Scale Temporal Mamba for Understanding Long Untrimmed Videos
Arkaprava Sinha
Monish Soundar Raj
Pu Wang
Ahmed Helmy
Srijan Das
Srijan Das
Mamba
577
5
0
10 Jan 2025
OV-HHIR: Open Vocabulary Human Interaction Recognition Using Cross-modal Integration of Large Language ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
L. Ray
Bo Zhou
Sungho Suh
P. Lukowicz
VLM
151
0
0
03 Jan 2025
MLVU: Benchmarking Multi-task Long Video Understanding
MLVU: Benchmarking Multi-task Long Video UnderstandingComputer Vision and Pattern Recognition (CVPR), 2024
Yueze Wang
Yan Shu
Bo Zhao
Boya Wu
Junjie Zhou
...
Xi Yang
Y. Xiong
Bo Zhang
Tiejun Huang
Zheng Liu
VLM
539
11
0
03 Jan 2025
Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection
Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection
Ayush Ghadiya
P. Kar
Vishal M. Chudasama
Pankaj Wasnik
355
18
0
31 Dec 2024
GFG -- Gender-Fair Generation: A CALAMITA Challenge
GFG -- Gender-Fair Generation: A CALAMITA Challenge
Simona Frenda
Andrea Piergentili
Beatrice Savoldi
Marco Madeddu
Martina Rosola
Silvia Casola
Chiara Ferrando
V. Patti
Matteo Negri
L. Bentivogli
313
12
0
31 Dec 2024
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Xinhao Li
Yi Wang
Jiashuo Yu
Xiangyu Zeng
Yuhan Zhu
...
Yinan He
Chenting Wang
Yu Qiao
Yali Wang
L. Wang
VLM
863
108
0
31 Dec 2024
Finger in Camera Speaks Everything: Unconstrained Air-Writing for
  Real-World
Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-World
Meiqi Wu
Kaiqi Huang
Yuanqiang Cai
Shiyu Hu
Yuzhong Zhao
Weiqiang Wang
VGen
210
7
0
27 Dec 2024
Sensitive Image Classification by Vision Transformers
Sensitive Image Classification by Vision TransformersIEEE International Conference on Systems, Man and Cybernetics (SMC), 2024
Hanxian He
Campbell Wilson
Thanh Thi Nguyen
Janis Dalins
ViT
320
1
0
21 Dec 2024
LEARN: A Unified Framework for Multi-Task Domain Adapt Few-Shot Learning
LEARN: A Unified Framework for Multi-Task Domain Adapt Few-Shot Learning
Bharadwaj Ravichandran
Alexander Lynch
S. Brockman
Brandon RichardWebster
Dawei Du
A. Hoogs
Christopher Funk
ObjDVLM
394
0
0
20 Dec 2024
Query-centric Audio-Visual Cognition Network for Moment Retrieval,
  Segmentation and Step-Captioning
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-CaptioningAAAI Conference on Artificial Intelligence (AAAI), 2024
Yunbin Tu
Liang-Sheng Li
Li Su
Qingming Huang
298
1
0
18 Dec 2024
Do Language Models Understand Time?
Do Language Models Understand Time?The Web Conference (WWW), 2024
Xi Ding
Lei Wang
941
10
0
18 Dec 2024
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language ContextsAAAI Conference on Artificial Intelligence (AAAI), 2024
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
367
2
0
18 Dec 2024
Move-in-2D: 2D-Conditioned Human Motion Generation
Move-in-2D: 2D-Conditioned Human Motion GenerationComputer Vision and Pattern Recognition (CVPR), 2024
Hsin-Ping Huang
Yang Zhou
Jui-Hsien Wang
Difan Liu
Feng Liu
Ming-Hsuan Yang
Zhan Xu
VGenDiffM
195
4
0
17 Dec 2024
Gramian Multimodal Representation Learning and Alignment
Gramian Multimodal Representation Learning and AlignmentInternational Conference on Learning Representations (ICLR), 2024
Giordano Cicchetti
Eleonora Grassucci
Luigi Sigillo
Danilo Comminiello
463
26
0
16 Dec 2024
Training Strategies for Isolated Sign Language Recognition
Training Strategies for Isolated Sign Language RecognitionJournal of WSCG (WSCG), 2024
Karina Kvanchiani
Roman Kraynov
Elizaveta Petrova
Petr Surovcev
Aleksandr Nagaev
A. Kapitanov
439
1
0
16 Dec 2024
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yulin Wang
Haoji Zhang
Yang Yue
Shiji Song
Chao Deng
Junlan Feng
Gao Huang
286
12
0
15 Dec 2024
Repetitive Action Counting with Hybrid Temporal Relation Modeling
Repetitive Action Counting with Hybrid Temporal Relation ModelingIEEE transactions on multimedia (IEEE TMM), 2024
Kun Li
Xinge Peng
Dan Guo
Xun Yang
Meng Wang
243
29
0
10 Dec 2024
Policy-shaped prediction: avoiding distractions in model-based
  reinforcement learning
Policy-shaped prediction: avoiding distractions in model-based reinforcement learningNeural Information Processing Systems (NeurIPS), 2024
Miles Hutson
Isaac Kauvar
Nick Haber
323
1
0
08 Dec 2024
Reinforcement Learning from Wild Animal Videos
Reinforcement Learning from Wild Animal Videos
Elliot Chane-Sane
Constant Roux
O. Stasse
Nicolas Mansard
952
1
0
05 Dec 2024
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand
  Audio-Visual Information?
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Kaixiong Gong
Kaituo Feng
Yangqiu Song
Yibing Wang
Mofan Cheng
...
Jiaming Han
Benyou Wang
Yutong Bai
Zhiyong Yang
Xiangyu Yue
MLLMAuLLMVLM
279
25
0
03 Dec 2024
SEAL: Semantic Attention Learning for Long Video Representation
SEAL: Semantic Attention Learning for Long Video RepresentationComputer Vision and Pattern Recognition (CVPR), 2024
Lan Wang
Yujia Chen
Wen-Sheng Chu
Vishnu Boddeti
Du Tran
VLM
635
7
0
02 Dec 2024
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
Dhiman Paul
Md Rizwan Parvez
Nabeel Mohammed
Shafin Rahman
VGen
278
4
0
02 Dec 2024
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions
Guanyu Zhou
Xiaohan Yu
Wenxin Huang
Xuemei Jia
Zhuo Zhou
Chia-Wen Lin
CML
399
1
0
24 Nov 2024
KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder
KDC-MAE: Knowledge Distilled Contrastive Mask Auto-EncoderIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Maheswar Bora
Saurabh Atreya
Aritra Mukherjee
Abhijit Das
313
0
0
19 Nov 2024
LaVin-DiT: Large Vision Diffusion TransformerComputer Vision and Pattern Recognition (CVPR), 2024
Zhaoqing Wang
Xiaobo Xia
Runnan Chen
Dongdong Yu
Changhu Wang
Mingming Gong
Tongliang Liu
553
19
0
18 Nov 2024
Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition
Jeonghyeok Do
Munchurl Kim
603
1
0
16 Nov 2024
DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization
DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization
C. Koutlis
Symeon Papadopoulos
424
7
0
15 Nov 2024
Weakly-Supervised Anomaly Detection in Surveillance Videos Based on
  Two-Stream I3D Convolution Network
Weakly-Supervised Anomaly Detection in Surveillance Videos Based on Two-Stream I3D Convolution Network
Sareh Nejad
Anwar Haque
181
6
0
13 Nov 2024
Pay Attention to the Keys: Visual Piano Transcription Using Transformers
Pay Attention to the Keys: Visual Piano Transcription Using TransformersInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Uros Zivanovic
Ivan Pilkov
Carlos Eduardo Cancino-Chacón
ViT
178
0
0
13 Nov 2024
Multi-Modal interpretable automatic video captioning
Multi-Modal interpretable automatic video captioning
Antoine Hanna-Asaad
Decky Aspandi
Titus Zaharia
258
1
0
11 Nov 2024
Balancing Multimodal Training Through Game-Theoretic Regularization
Balancing Multimodal Training Through Game-Theoretic Regularization
Konstantinos Kontras
Thomas Strypsteen
Christos Chatzichristos
Paul P. Liang
Matthew Blaschko
M. D. Vos
398
3
0
11 Nov 2024
Extended multi-stream temporal-attention module for skeleton-based human
  action recognition (HAR)
Extended multi-stream temporal-attention module for skeleton-based human action recognition (HAR)Computers in Human Behavior (CHB), 2024
Faisal Mehmood
Xin Guo
Enqing Chen
Muhammad Azeem Akbar
A. Khan
Sami Ullah
331
9
0
10 Nov 2024
Improved Video VAE for Latent Video Diffusion Model
Improved Video VAE for Latent Video Diffusion ModelComputer Vision and Pattern Recognition (CVPR), 2024
Pingyu Wu
Kai Zhu
Yu Liu
Liming Zhao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
VGenDiffM
175
18
0
10 Nov 2024
CityGuessr: City-Level Video Geo-Localization on a Global Scale
CityGuessr: City-Level Video Geo-Localization on a Global ScaleEuropean Conference on Computer Vision (ECCV), 2024
P. Kulkarni
Gaurav Kumar Nayak
Mubarak Shah
ViTAI4TS
192
9
0
10 Nov 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Don't Look Twice: Faster Video Transformers with Run-Length TokenizationNeural Information Processing Systems (NeurIPS), 2024
Rohan Choudhury
Guanglei Zhu
Sihan Liu
Koichiro Niinuma
Kishore Venkateshan
László A. Jeni
249
27
0
07 Nov 2024
HourVideo: 1-Hour Video-Language Understanding
HourVideo: 1-Hour Video-Language UnderstandingNeural Information Processing Systems (NeurIPS), 2024
Keshigeyan Chandrasegaran
Agrim Gupta
Lea M. Hadzic
Taran Kota
Jimming He
Cristobal Eyzaguirre
Zane Durante
Pengfei Yu
Jiajun Wu
L. Fei-Fei
VLM
290
83
0
07 Nov 2024
Previous
123456...424344
Next