ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.06950
  4. Cited By
The Kinetics Human Action Video Dataset

The Kinetics Human Action Video Dataset

19 May 2017
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
Sudheendra Vijayanarasimhan
Fabio Viola
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "The Kinetics Human Action Video Dataset"

50 / 2,153 papers shown
Video-Bench: Human-Aligned Video Generation Benchmark
Video-Bench: Human-Aligned Video Generation BenchmarkComputer Vision and Pattern Recognition (CVPR), 2025
Hui Han
Siyuan Li
Jiaqi Chen
Yiwen Yuan
Yuling Wu
...
You Li
Jing Zhang
Chi Zhang
Li Li
Yongxin Ni
EGVMVGen
587
13
0
07 Apr 2025
SnapPix: Efficient-Coding--Inspired In-Sensor Compression for Edge Vision
SnapPix: Efficient-Coding--Inspired In-Sensor Compression for Edge VisionDesign Automation Conference (DAC), 2025
Weikai Lin
Tianrui Ma
Adith Boloor
Yu Feng
Ruofan Xing
Xuan Zhang
Yuhao Zhu
146
0
0
06 Apr 2025
3D Scene Understanding Through Local Random Access Sequence Modeling
3D Scene Understanding Through Local Random Access Sequence Modeling
Wanhee Lee
Klemen Kotar
R. Venkatesh
Jared Watrous
Honglin Chen
Khai Loong Aw
Daniel L. K. Yamins
3DV
248
3
0
04 Apr 2025
SocialGesture: Delving into Multi-person Gesture Understanding
SocialGesture: Delving into Multi-person Gesture UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Xu Cao
Pranav Virupaksha
Wenqi Jia
Bolin Lai
Fiona Ryan
Sangmin Lee
James M. Rehg
SLR
230
5
0
03 Apr 2025
Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness
Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness
Yusheng Zhao
Junyu Luo
Zhiyuan Ning
Weizhi Zhang
Zhiping Xiao
Wei Ju
Philip S. Yu
Ming Zhang
AuLLM
337
0
0
03 Apr 2025
UniViTAR: Unified Vision Transformer with Native Resolution
UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao
Yiyang Gan
Bairui Wang
Jie Qin
Shuang Xu
Siqi Yang
Lin Ma
486
3
0
02 Apr 2025
Learning from Streaming Video with Orthogonal Gradients
Learning from Streaming Video with Orthogonal GradientsComputer Vision and Pattern Recognition (CVPR), 2025
Tengda Han
Dilara Gokay
Joseph Heyward
Chuhan Zhang
Daniel Zoran
Viorica Patraucean
João Carreira
Dima Damen
Andrew Zisserman
278
5
0
02 Apr 2025
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
SMILE: Infusing Spatial and Motion Semantics in Masked Video LearningComputer Vision and Pattern Recognition (CVPR), 2025
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Bernard Ghanem
346
3
0
01 Apr 2025
Sample-level Adaptive Knowledge Distillation for Action Recognition
Sample-level Adaptive Knowledge Distillation for Action Recognition
Ping Li
Chenhao Ping
Wenxiao Wang
Mingli Song
331
3
0
01 Apr 2025
Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement Learning
Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement LearningInternational Symposium on Modeling and Optimization in Mobile, Ad-Hoc and Wireless Networks (WiOpt), 2025
Yubo Zhang
Pedro Botelho
Trevor Gordon
Gil Zussman
I. Kadota
281
1
0
31 Mar 2025
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
Jongseo Lee
Joohyun Chang
Dongho Lee
Jinwoo Choi
533
0
0
30 Mar 2025
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users
Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired UsersAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Antonia Karamolegkou
Malvina Nikandrou
Georgios Pantazopoulos
Danae Sanchez Villegas
Phillip Rust
Ruchira Dhar
Daniel Hershcovich
Anders Søgaard
235
2
0
28 Mar 2025
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
Abdelrahman M. Shaker
Muhammad Maaz
Chenhui Gou
Hamid Rezatofighi
Salman Khan
Fahad Shahbaz Khan
929
3
0
27 Mar 2025
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video BenchmarksComputer Vision and Pattern Recognition (CVPR), 2025
Nina Shvetsova
Arsha Nagrani
Bernt Schiele
Hilde Kuehne
Christian Rupprecht
268
1
0
24 Mar 2025
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Adaptive Unimodal Regulation for Balanced Multimodal Information AcquisitionComputer Vision and Pattern Recognition (CVPR), 2025
Chengxiang Huang
Yake Wei
Zequn Yang
D. Hu
291
7
0
24 Mar 2025
ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset
ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset
Zihao Chen
Hsuanyu Wu
Chi-Hsi Kung
Yi-Ting Chen
Yan-Tsung Peng
237
1
0
24 Mar 2025
Temporal Action Detection Model Compression by Progressive Block Drop
Temporal Action Detection Model Compression by Progressive Block DropComputer Vision and Pattern Recognition (CVPR), 2025
Xiaoyong Chen
Yong Guo
Jiaming Liang
Sitong Zhuang
Runhao Zeng
Xiping Hu
302
1
0
21 Mar 2025
Structured-Noise Masked Modeling for Video, Audio and Beyond
Structured-Noise Masked Modeling for Video, Audio and Beyond
Aritra Bhowmik
Fida Mohammad Thoker
Carlos Hinojosa
Bernard Ghanem
Cees G. M. Snoek
VGen
320
0
0
20 Mar 2025
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal Representations
MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal RepresentationsComputer Vision and Pattern Recognition (CVPR), 2025
Kyungho Bae
Jinhyung Kim
Sihaeng Lee
Soonyoung Lee
G. Lee
Jinwoo Choi
298
12
0
20 Mar 2025
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
Chongjun Tu
Lin Zhang
Pengtao Chen
Peng Ye
Xianfang Zeng
Wei Cheng
Gang Yu
Tao Chen
362
8
0
19 Mar 2025
Efficient Motion-Aware Video MLLM
Efficient Motion-Aware Video MLLMComputer Vision and Pattern Recognition (CVPR), 2025
Zijia Zhao
Yuqi Huo
Tongtian Yue
Longteng Guo
Haoyu Lu
Binghai Wang
Xin Wu
Qingbin Liu
257
4
0
17 Mar 2025
Action tube generation by person query matching for spatio-temporal action detection
Action tube generation by person query matching for spatio-temporal action detection
Kazuki Omi
Jion Oshima
Toru Tamaki
376
0
0
17 Mar 2025
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition
Shristi Das Biswas
Efstathia Soufleri
Arani Roy
Kaushik Roy
332
1
0
17 Mar 2025
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
Yunze Liu
Peiran Wu
C. Liang
Junxiao Shen
Limin Wang
Li Yi
Mamba
352
2
0
16 Mar 2025
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
Haonan Wang
Qixiang Zhang
Lehan Wang
Xuanqi Huang
Xiaomeng Li
VOSVGen
319
3
0
14 Mar 2025
KVQ: Boosting Video Quality Assessment via Saliency-guided Local PerceptionComputer Vision and Pattern Recognition (CVPR), 2025
Yunpeng Qu
Kun Yuan
Qizhi Xie
Ming-Ting Sun
Chao Zhou
Jian Wang
378
5
0
13 Mar 2025
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025
Wanhua Li
Renping Zhou
Jiawei Zhou
Yingwei Song
Johannes Herter
Minghan Qin
Gao Huang
Hanspeter Pfister
3DGSVLM
429
18
0
13 Mar 2025
STEAD: Spatio-Temporal Efficient Anomaly Detection for Time and Compute Sensitive Applications
Andrew Gao
Jun Liu
AI4TS
212
1
0
11 Mar 2025
HERO: Human Reaction Generation from Videos
Chengjun Yu
Wei-dong Zhai
Yuhang Yang
Yang Cao
Zheng-jun Zha
VGen
319
5
0
11 Mar 2025
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos
Chen-Da Liu-Zhang
Lin Sui
Shuming Liu
Fangzhou Mu
Ziyi Wang
Bernard Ghanem
316
3
0
09 Mar 2025
End-to-End Action Segmentation Transformer
End-to-End Action Segmentation Transformer
Tieqiao Wang
Sinisa Todorovic
ViT
292
1
0
08 Mar 2025
Secure On-Device Video OOD Detection Without Backpropagation
Secure On-Device Video OOD Detection Without Backpropagation
Li Li
Peilin Cai
Yuxiao Zhou
Zhiyu Ni
Renjie Liang
You Qin
Yi Nian
Zhuowen Tu
Xiyang Hu
Yue Zhao
OODDFedML
294
9
0
08 Mar 2025
Exploring Simple Siamese Network for High-Resolution Video Quality AssessmentIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Guotao Shen
Ziheng Yan
Xin Jin
Longhai Wu
Jie Chen
Ilhyun Cho
Cheul-hee Hahm
185
0
0
04 Mar 2025
Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup
Seokun Kang
Taehwan Kim
272
0
0
04 Mar 2025
Attention Bootstrapping for Multi-Modal Test-Time AdaptationAAAI Conference on Artificial Intelligence (AAAI), 2025
Yusheng Zhao
Junyu Luo
Xiao Luo
Jinsheng Huang
Jingyang Yuan
Zhiping Xiao
Min Zhang
TTA
298
2
0
04 Mar 2025
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal SynchronizationComputer Vision and Pattern Recognition (CVPR), 2025
Zitang Zhou
Ke Mei
Yu Lu
Tianyi Wang
Fengyun Rao
430
7
0
03 Mar 2025
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation LearningInternational Conference on Learning Representations (ICLR), 2025
Baoqi Pei
Yuanmin Huang
Jilan Xu
Guo Chen
Yuping He
...
Yali Wang
Weidi Xie
Yu Qiao
Leilei Gan
Limin Wang
274
11
0
02 Mar 2025
AgroLLM: Connecting Farmers and Agricultural Practices through Large Language Models for Enhanced Knowledge Transfer and Practical Application
Dinesh Jackson Samuel
Inna Skarga-Bandurova
David Sikolia
Muhammad Awais
270
2
0
28 Feb 2025
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour Recognition
The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour RecognitionComputer Vision and Pattern Recognition (CVPR), 2025
Otto Brookes
Maksim Kukushkin
Majid Mirmehdi
Colleen Stephens
Paula Dieguez
...
Lukas Boesch
Thomas Schmid
M. Arandjelovic
H. Kühl
T. Burghardt
317
2
0
28 Feb 2025
Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints
Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints
Masoumeh Chapariniya
Hossein Ranjbar
Teodora Vukovic
Sarah Ebling
Volker Dellwo
3DPC
190
0
0
28 Feb 2025
Subtask-Aware Visual Reward Learning from Segmented Demonstrations
Subtask-Aware Visual Reward Learning from Segmented DemonstrationsInternational Conference on Learning Representations (ICLR), 2025
Changyeon Kim
Minho Heo
Doohyun Lee
Jinwoo Shin
Honglak Lee
Joseph J. Lim
Kimin Lee
235
3
0
28 Feb 2025
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Yating Yu
Congqi Cao
Yifan Zhang
Yanning Zhang
VLM
324
4
0
27 Feb 2025
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection
Shuming Liu
Chen Zhao
Fatimah Zohra
Mattia Soldan
Alejandro Pardo
...
Juan Carlos León Alcázar
A. Cioppa
Silvio Giancola
Carlos Hinojosa
Bernard Ghanem
297
6
0
27 Feb 2025
Balanced Representation Learning for Long-tailed Skeleton-based Action Recognition
Balanced Representation Learning for Long-tailed Skeleton-based Action RecognitionMachine Intelligence Research (MIR), 2023
Hongda Liu
Yunlong Wang
Min Ren
Junxing Hu
Zhengquan Luo
Guangqi Hou
Zhe Sun
284
3
0
24 Feb 2025
Multi-Dimensional Quality Assessment for Text-to-3D Assets: Dataset and Model
Multi-Dimensional Quality Assessment for Text-to-3D Assets: Dataset and ModelIEEE transactions on multimedia (TMM), 2025
Kang Fu
Huiyu Duan
Zicheng Zhang
Xiaohong Liu
Xiongkuo Min
Jia Wang
Guoquan Zheng
EGVM
151
4
0
24 Feb 2025
Fine-Grained Captioning of Long Videos through Scene Graph Consolidation
Fine-Grained Captioning of Long Videos through Scene Graph Consolidation
Sanghyeok Chu
Seonguk Seo
Bohyung Han
604
1
0
23 Feb 2025
Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language RecognitionInternational Conference on Learning Representations (ICLR), 2025
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Mengqi He
Jing Zhang
VLM
309
5
0
19 Feb 2025
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation
Sihyun Yu
Meera Hahn
Dan Kondratyuk
Jinwoo Shin
Agrim Gupta
José Lezama
Irfan Essa
David A. Ross
Jonathan Huang
DiffMVGen
701
6
0
18 Feb 2025
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the WildNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Junhyeok Kim
Min Soo Kim
Jiwan Chung
Jungbin Cho
Jisoo Kim
Sungwoong Kim
Gyeongbo Sim
Youngjae Yu
EgoV
164
3
0
17 Feb 2025
Improving action segmentation via explicit similarity measurement
Improving action segmentation via explicit similarity measurement
Kamel Aouaidjia
Wenhao Zhang
Aofan Li
Chongsheng Zhang
268
0
0
15 Feb 2025
Previous
12345...424344
Next
Page 4 of 44
Pageof 44