ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.06950
  4. Cited By
The Kinetics Human Action Video Dataset

The Kinetics Human Action Video Dataset

19 May 2017
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
Sudheendra Vijayanarasimhan
Fabio Viola
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "The Kinetics Human Action Video Dataset"

50 / 2,151 papers shown
Title
Disentangling Static and Dynamic Information for Reducing Static Bias in Action Recognition
Disentangling Static and Dynamic Information for Reducing Static Bias in Action Recognition
Masato Kobayashi
Ning Ding
Toru Tamaki
104
1
0
27 Sep 2025
Category Discovery: An Open-World Perspective
Category Discovery: An Open-World Perspective
Zhenqi He
Yuanpei Liu
Kai Han
238
1
0
26 Sep 2025
Every Subtlety Counts: Fine-grained Person Independence Micro-Action Recognition via Distributionally Robust Optimization
Every Subtlety Counts: Fine-grained Person Independence Micro-Action Recognition via Distributionally Robust Optimization
Feng-Qi Cui
Jinyang Huang
Anyang Tong
Ziyu Jia
Jie Zhang
Zhi Liu
Dan Guo
Jianwei Lu
Meng Wang
164
0
0
25 Sep 2025
VC-Agent: An Interactive Agent for Customized Video Dataset Collection
VC-Agent: An Interactive Agent for Customized Video Dataset Collection
Yidan Zhang
Mutian Xu
Yiming Hao
Kun Zhou
Jiahao Chang
Xiaoqiang Liu
Pengfei Wan
Hongbo Fu
Xiaoguang Han
VGen
164
0
0
25 Sep 2025
MoCLIP-Lite: Efficient Video Recognition by Fusing CLIP with Motion Vectors
MoCLIP-Lite: Efficient Video Recognition by Fusing CLIP with Motion Vectors
Binhua Huang
Nan Wang
Arjun Parakash
Soumyabrata Dev
CLIPVLM
85
0
0
21 Sep 2025
KRAST: Knowledge-Augmented Robotic Action Recognition with Structured Text for Vision-Language Models
KRAST: Knowledge-Augmented Robotic Action Recognition with Structured Text for Vision-Language Models
Son Hai Nguyen
Diwei Wang
Jinhyeok Jang
Hyewon Seo
98
0
0
19 Sep 2025
Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection
Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection
Boyu Han
Qianqian Xu
Shilong Bao
Zhiyong Yang
Sicong Li
Qingming Huang
EgoVMoE
407
0
0
16 Sep 2025
More performant and scalable: Rethinking contrastive vision-language pre-training of radiology in the LLM era
More performant and scalable: Rethinking contrastive vision-language pre-training of radiology in the LLM eraInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Yingtai Li
Haoran Lai
Xiaoqian Zhou
Shuai Ming
Wenxin Ma
Weifu Lv
S. Kevin Zhou
MedImLM&MAVLM
91
0
0
16 Sep 2025
ResidualViT for Efficient Temporally Dense Video Encoding
ResidualViT for Efficient Temporally Dense Video Encoding
Mattia Soldan
Fabian Caba Heilbron
Bernard Ghanem
Josef Sivic
Bryan C. Russell
167
0
0
16 Sep 2025
Video Understanding by Design: How Datasets Shape Architectures and Insights
Video Understanding by Design: How Datasets Shape Architectures and Insights
Lei Wang
Piotr Koniusz
Yongsheng Gao
3DVVGenAI4TS
225
0
0
11 Sep 2025
Video-Based MPAA Rating Prediction: An Attention-Driven Hybrid Architecture Using Contrastive Learning
Video-Based MPAA Rating Prediction: An Attention-Driven Hybrid Architecture Using Contrastive Learning
Dipta Neogi
Nourash Azmine Chowdhury
Muhammad Rafsan Kabir
Mohammad Ashrafuzzaman Khan
64
0
0
08 Sep 2025
DuoCLR: Dual-Surrogate Contrastive Learning for Skeleton-based Human Action Segmentation
DuoCLR: Dual-Surrogate Contrastive Learning for Skeleton-based Human Action Segmentation
Haitao Tian
Pierre Payeur
152
0
0
05 Sep 2025
DynaMind: Reconstructing Dynamic Visual Scenes from EEG by Aligning Temporal Dynamics and Multimodal Semantics to Guided Diffusion
DynaMind: Reconstructing Dynamic Visual Scenes from EEG by Aligning Temporal Dynamics and Multimodal Semantics to Guided Diffusion
Junxiang Liu
Junming Lin
Jiangtong Li
Jie Li
DiffMVGen
83
1
0
01 Sep 2025
What Can We Learn from Harry Potter? An Exploratory Study of Visual Representation Learning from Atypical Videos
What Can We Learn from Harry Potter? An Exploratory Study of Visual Representation Learning from Atypical Videos
Qiyue Sun
Qiming Huang
Yang Yang
Hongjun Wang
Jianbo Jiao
201
0
0
29 Aug 2025
Unsupervised Video Continual Learning via Non-Parametric Deep Embedded Clustering
Unsupervised Video Continual Learning via Non-Parametric Deep Embedded Clustering
Nattapong Kurpukdee
Adrian G. Bors
136
0
0
29 Aug 2025
Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding
Looking Beyond the Obvious: A Survey on Abstract Concept Recognition for Video Understanding
Gowreesh Mago
Pascal Mettes
Stevan Rudinac
132
0
0
28 Aug 2025
AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning
AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning
Shu Shen
Chao Chen
Tong Zhang
196
0
0
27 Aug 2025
Two-Stage Framework for Efficient UAV-Based Wildfire Video Analysis with Adaptive Compression and Fire Source Detection
Two-Stage Framework for Efficient UAV-Based Wildfire Video Analysis with Adaptive Compression and Fire Source Detection
Yanbing Bai
Rui-Yang Ju
Lemeng Zhao
Junjie Hu
Jianchao Bi
Erick Mas
Shunichi Koshimura
102
0
0
22 Aug 2025
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models
Haidong Xu
Guangwei Xu
Zhedong Zheng
Xiatian Zhu
Wei Ji
Xiangtai Li
Ruijie Guo
Meishan Zhang
M. Zhang
Hao Fei
166
1
0
16 Aug 2025
Generic Event Boundary Detection via Denoising Diffusion
Generic Event Boundary Detection via Denoising Diffusion
Jaejun Hwang
Dayoung Gong
Manjin Kim
Minsu Cho
DiffM
125
0
0
16 Aug 2025
Versatile Video Tokenization with Generative 2D Gaussian Splatting
Versatile Video Tokenization with Generative 2D Gaussian Splatting
Zhenghao Chen
Zicong Chen
Lei Liu
Yiming Wu
Dong Xu
3DGS
111
0
0
15 Aug 2025
DIVA-VQA: Detecting Inter-frame Variations in UGC Video Quality
DIVA-VQA: Detecting Inter-frame Variations in UGC Video QualityInternational Conference on Information Photonics (ICIP), 2025
Xinyi Wang
Angeliki V. Katsenou
David Bull
84
1
0
14 Aug 2025
ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning
ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning
Jongseo Lee
Kyungho Bae
Kyle Min
Gyeong-Moon Park
J. Choi
CLLVLM
175
0
0
14 Aug 2025
VGGSounder: Audio-Visual Evaluations for Foundation Models
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev
Thaddäus Wiedemer
Christian Schroeder de Witt
Matthias Bethge
Wieland Brendel
A. Sophia Koepke
AuLLM
215
3
0
11 Aug 2025
Q-CLIP: Unleashing the Power of Vision-Language Models for Video Quality Assessment through Unified Cross-Modal Adaptation
Q-CLIP: Unleashing the Power of Vision-Language Models for Video Quality Assessment through Unified Cross-Modal Adaptation
Yachun Mi
Yu Li
Y. Li
Chen Hui
Tong Zhang
Zhixuan Li
Chenyue Song
Wei Yang Bryan Lim
Shaohui Liu
VLM
112
0
0
08 Aug 2025
CRAM: Large-scale Video Continual Learning with Bootstrapped Compression
CRAM: Large-scale Video Continual Learning with Bootstrapped Compression
Shivani Mall
Joao F. Henriques
CLLVLM
124
0
0
07 Aug 2025
CLASP: Cross-modal Salient Anchor-based Semantic Propagation for Weakly-supervised Dense Audio-Visual Event Localization
CLASP: Cross-modal Salient Anchor-based Semantic Propagation for Weakly-supervised Dense Audio-Visual Event Localization
Jinxing Zhou
Ziheng Zhou
Yanghao Zhou
Yuxin Mao
Zhangling Duan
Dan Guo
104
2
0
06 Aug 2025
Separating Shared and Domain-Specific LoRAs for Multi-Domain Learning
Separating Shared and Domain-Specific LoRAs for Multi-Domain Learning
Yusaku Takama
Ning Ding
Tatsuya Yokota
Toru Tamaki
134
0
0
05 Aug 2025
MoExDA: Domain Adaptation for Edge-based Action Recognition
MoExDA: Domain Adaptation for Edge-based Action Recognition
Takuya Sugimoto
Ning Ding
Toru Tamaki
152
0
0
05 Aug 2025
SGCap: Decoding Semantic Group for Zero-shot Video Captioning
SGCap: Decoding Semantic Group for Zero-shot Video Captioning
Zeyu Pan
Ping Li
Wenxiao Wang
VLM
102
0
0
02 Aug 2025
StepAL: Step-aware Active Learning for Cataract Surgical Videos
StepAL: Step-aware Active Learning for Cataract Surgical VideosInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Nisarg A. Shah
Bardia Safaei
S. Sikder
S. Vedula
Vishal M. Patel
134
1
0
29 Jul 2025
MOVE: Motion-Guided Few-Shot Video Object Segmentation
MOVE: Motion-Guided Few-Shot Video Object Segmentation
Kaining Ying
Hengrui Hu
Henghui Ding
VOS
234
3
0
29 Jul 2025
HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly
HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly
Chang Liu
Yunfan Ye
Fan Zhang
Q. Zhou
Yuchuan Luo
Zhiping Cai
231
1
0
26 Jul 2025
Back to the Features: DINO as a Foundation for Video World Models
Back to the Features: DINO as a Foundation for Video World Models
Federico Baldassarre
Marc Szafraniec
Basile Terver
Vasil Khalidov
Francisco Massa
Yann LeCun
Patrick Labatut
Maximilian Seitzer
Piotr Bojanowski
VGen
167
24
0
25 Jul 2025
Probing Multimodal Fusion in the Brain: The Dominance of Audiovisual Streams in Naturalistic Encoding
Probing Multimodal Fusion in the Brain: The Dominance of Audiovisual Streams in Naturalistic Encoding
Hamid Abdollahi
Amir Hossein Mansouri Majoumerd
Amir Hossein Bagheri Baboukani
Amir Abolfazl Suratgar
Mohammad Bagher Menhaj
68
0
0
25 Jul 2025
Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows
Iwin Transformer: Hierarchical Vision Transformer using Interleaved Windows
Simin Huo
Ning Li
ViT
212
0
0
24 Jul 2025
Discovering and using Spelke segments
Discovering and using Spelke segments
R. Venkatesh
Klemen Kotar
Lilian Naing Chen
Seungwoo Kim
Luca Thomas Wheeler
...
Wanhee Lee
Honglin Chen
Daniel M. Bear
Stefan Stojanov
Daniel L. K. Yamins
145
0
0
21 Jul 2025
ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization
ESG-Net: Event-Aware Semantic Guided Network for Dense Audio-Visual Event Localization
Huilai Li
Yonghao Dang
Ying Xing
Yiming Wang
Jianqin Yin
151
0
0
14 Jul 2025
Simplifying Traffic Anomaly Detection with Video Foundation Models
Simplifying Traffic Anomaly Detection with Video Foundation Models
Svetlana Orlova
Tommie Kerssies
B. B. Englert
Gijs Dubbelman
ViT
116
1
0
12 Jul 2025
Video-Guided Text-to-Music Generation Using Public Domain Movie Collections
Video-Guided Text-to-Music Generation Using Public Domain Movie Collections
Haven Kim
Cheng-i Wang
Weihan Xu
Julian McAuley
Hao-Wen Dong
VGen
237
4
0
01 Jul 2025
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark
Yi Xin
Jianjiang Yang
Haodi Zhou
Junlong Du
Qi Qin
...
Bin Fu
Xiaokang Yang
Guangtao Zhai
Ming-Hsuan Yang
Xiaohong Liu
VLM
540
86
0
01 Jul 2025
Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment
Can Video Large Multimodal Models Think Like Doubters-or Double-Down: A Study on Defeasible Video Entailment
Yue Zhang
Jilei Sun
Yunhui Guo
Vibhav Gogate
LRM
192
1
0
27 Jun 2025
Improving Token-based Object Detection with Video
Improving Token-based Object Detection with VideoIEEE Access (IEEE Access), 2025
Abhineet Singh
Nilanjan Ray
120
0
0
27 Jun 2025
Can Vision Language Models Understand Mimed Actions?
Can Vision Language Models Understand Mimed Actions?Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Hyundong Justin Cho
Spencer Lin
Tejas Srinivasan
Michael Saxon
Deuksin Kwon
Natali T. Chavez
Jonathan May
164
3
0
17 Jun 2025
Action Dubber: Timing Audible Actions via Inflectional Flow
Action Dubber: Timing Audible Actions via Inflectional Flow
Wenlong Wan
Weiying Zheng
Tianyi Xiang
Guiqing Li
Shengfeng He
153
0
0
16 Jun 2025
Exploring Audio Cues for Enhanced Test-Time Video Model Adaptation
Exploring Audio Cues for Enhanced Test-Time Video Model Adaptation
Runhao Zeng
Qi Deng
Ronghao Zhang
Shuaicheng Niu
Jian Chen
Xiping Hu
Victor C. M. Leung
TTA
117
0
0
14 Jun 2025
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video ClassificationComputer Vision and Pattern Recognition (CVPR), 2025
Darryl Ho
Samuel Madden
AI4TS
178
0
0
14 Jun 2025
Improving Multimodal Learning Balance and Sufficiency through Data Remixing
Improving Multimodal Learning Balance and Sufficiency through Data Remixing
Xiaoyu Ma
Hao Chen
Yongjian Deng
212
4
0
13 Jun 2025
Can Sound Replace Vision in LLaVA With Token Substitution?
Can Sound Replace Vision in LLaVA With Token Substitution?
Ali Vosoughi
Jing Bi
Pinxin Liu
Yunlong Tang
Chenliang Xu
CLIPVLM
320
0
0
12 Jun 2025
An Effective End-to-End Solution for Multimodal Action RecognitionInternational Conference on Pattern Recognition (ICPR), 2025
Songping Wang
Xiantao Hu
Yueming Lyu
Caifeng Shan
215
2
0
11 Jun 2025
Previous
12345...424344
Next