ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1705.06950
  4. Cited By
The Kinetics Human Action Video Dataset

The Kinetics Human Action Video Dataset

19 May 2017
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
Sudheendra Vijayanarasimhan
Fabio Viola
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "The Kinetics Human Action Video Dataset"

50 / 2,152 papers shown
CMOSE: Comprehensive Multi-Modality Online Student Engagement Dataset
  with High-Quality Labels
CMOSE: Comprehensive Multi-Modality Online Student Engagement Dataset with High-Quality Labels
Chi-hsuan Wu
Shih-yang Liu
Xijie Huang
Xingbo Wang
Rong Zhang
Luca Minciullo
Wong Kai Yiu
Kenny Kwan
Kwang-Ting Cheng
272
6
0
14 Dec 2023
EZ-CLIP: Efficient Zeroshot Video Action Recognition
EZ-CLIP: Efficient Zeroshot Video Action Recognition
Shahzad Ahmad
S. Chanda
Yogesh S Rawat
VLM
273
11
0
13 Dec 2023
Counterfactual World Modeling for Physical Dynamics Understanding
Counterfactual World Modeling for Physical Dynamics Understanding
Rahul Venkatesh
Honglin Chen
Kevin T. Feigelis
Daniel M. Bear
Khaled Jedoui
...
Wanhee Lee
Sherry Liu
Kevin A. Smith
Judith E. Fan
Daniel L. K. Yamins
VGen
309
7
0
11 Dec 2023
A Cascaded Neural Network System For Rating Student Performance In
  Surgical Knot Tying Simulation
A Cascaded Neural Network System For Rating Student Performance In Surgical Knot Tying SimulationIEEE International Conference on Healthcare Informatics (ICHI), 2023
Yunzhe Xue
Olanrewaju A Eletta
J. Ady
Nell M. Patel
Advaith Bongu
Usman Roshan
227
3
0
09 Dec 2023
A Review of Machine Learning Methods Applied to Video Analysis Systems
A Review of Machine Learning Methods Applied to Video Analysis SystemsAsilomar Conference on Signals, Systems and Computers (ACSSC), 2023
Marios S. Pattichis
Venkatesh Jatla
Alvaro E. Ullao Cerna
109
7
0
08 Dec 2023
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form
  Egocentric Videos
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
Ying Wang
Yanlai Yang
Mengye Ren
252
29
0
07 Dec 2023
The Potential of Vision-Language Models for Content Moderation of
  Children's Videos
The Potential of Vision-Language Models for Content Moderation of Children's Videos
Syed Hammad Ahmed
Shengnan Hu
G. Sukthankar
VLM
252
4
0
06 Dec 2023
From Detection to Action Recognition: An Edge-Based Pipeline for Robot
  Human Perception
From Detection to Action Recognition: An Edge-Based Pipeline for Robot Human Perception
Petros Toupas
Georgios Tsamis
Dimitrios Giakoumis
K. Votis
Dimitrios Tzovaras
141
1
0
06 Dec 2023
Deep Multimodal Fusion for Surgical Feedback Classification
Deep Multimodal Fusion for Surgical Feedback Classification
Rafal Kocielnik
Elyssa Y. Wong
Timothy N. Chu
Lydia Lin
De-An Huang
Jiayun Wang
A. Anandkumar
Andrew J. Hung
185
6
0
06 Dec 2023
DemaFormer: Damped Exponential Moving Average Transformer with
  Energy-Based Modeling for Temporal Language Grounding
DemaFormer: Damped Exponential Moving Average Transformer with Energy-Based Modeling for Temporal Language GroundingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Thong Nguyen
Xiaobao Wu
Xinshuai Dong
Cong-Duy Nguyen
See-Kiong Ng
Anh Tuan Luu
283
10
0
05 Dec 2023
Adapting Short-Term Transformers for Action Detection in Untrimmed
  Videos
Adapting Short-Term Transformers for Action Detection in Untrimmed VideosComputer Vision and Pattern Recognition (CVPR), 2023
Min Yang
Huan Gao
Ping Guo
Limin Wang
ViT
277
17
0
04 Dec 2023
Generating Action-conditioned Prompts for Open-vocabulary Video Action
  Recognition
Generating Action-conditioned Prompts for Open-vocabulary Video Action RecognitionACM Multimedia (ACM MM), 2023
Chengyou Jia
Minnan Luo
Xiaojun Chang
Zhuohang Dang
Mingfei Han
Mengmeng Wang
Guangwen Dai
Sizhe Dang
Jingdong Wang
VLM
195
14
0
04 Dec 2023
Hulk: A Universal Knowledge Translator for Human-Centric Tasks
Hulk: A Universal Knowledge Translator for Human-Centric TasksIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Yizhou Wang
YiXuan Wu
Weizhen He
Xun Guo
Xun Guo
...
Mengwei He
Rui Zhao
Jian Wu
Tong He
Bin Wang
VLM
709
20
0
04 Dec 2023
Towards Generalizable Zero-Shot Manipulation via Translating Human
  Interaction Plans
Towards Generalizable Zero-Shot Manipulation via Translating Human Interaction PlansIEEE International Conference on Robotics and Automation (ICRA), 2023
Homanga Bharadhwaj
Abhi Gupta
Vikash Kumar
Shubham Tulsiani
LM&Ro
316
58
0
01 Dec 2023
Just Add $π$! Pose Induced Video Transformers for Understanding
  Activities of Daily Living
Just Add πππ! Pose Induced Video Transformers for Understanding Activities of Daily LivingComputer Vision and Pattern Recognition (CVPR), 2023
Dominick Reilly
Srijan Das
ViT
296
27
0
30 Nov 2023
CAST: Cross-Attention in Space and Time for Video Action Recognition
CAST: Cross-Attention in Space and Time for Video Action RecognitionNeural Information Processing Systems (NeurIPS), 2023
Dongho Lee
Jongseo Lee
Jinwoo Choi
EgoV
341
30
0
30 Nov 2023
DEVIAS: Learning Disentangled Video Representations of Action and Scene
  for Holistic Video Understanding
DEVIAS: Learning Disentangled Video Representations of Action and Scene for Holistic Video UnderstandingEuropean Conference on Computer Vision (ECCV), 2023
Kyungho Bae
Geo Ahn
Youngrae Kim
Jinwoo Choi
327
0
0
30 Nov 2023
Source-free Video Domain Adaptation by Learning from Noisy Labels
Source-free Video Domain Adaptation by Learning from Noisy LabelsPattern Recognition (Pattern Recogn.), 2023
A. Dasgupta
C. V. Jawahar
Karteek Alahari
TTAVLM
490
13
0
30 Nov 2023
VBench: Comprehensive Benchmark Suite for Video Generative Models
VBench: Comprehensive Benchmark Suite for Video Generative ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Ziqi Huang
Yinan He
Jiashuo Yu
Fan Zhang
Chenyang Si
...
Xinyuan Chen
Limin Wang
Dahua Lin
Yu Qiao
Ziwei Liu
VGen
514
968
0
29 Nov 2023
GeoDeformer: Geometric Deformable Transformer for Action Recognition
GeoDeformer: Geometric Deformable Transformer for Action Recognition
Jinhui Ye
Jiaming Zhou
Hui Xiong
Junwei Liang
ViT
111
1
0
29 Nov 2023
Action-slot: Visual Action-centric Representations for Multi-label
  Atomic Activity Recognition in Traffic Scenes
Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic ScenesComputer Vision and Pattern Recognition (CVPR), 2023
Chi-Hsi Kung
Shu-Wei Lu
Yi-Hsuan Tsai
Yi-Ting Chen
361
15
0
29 Nov 2023
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with
  Semantic Vector-Quantized Tokenizer
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
VLM
392
1
0
28 Nov 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000
  Frames
End-to-End Temporal Action Detection with 1B Parameters Across 1000 FramesComputer Vision and Pattern Recognition (CVPR), 2023
Shuming Liu
Chen-Da Liu-Zhang
Chen Zhao
Guohao Li
346
50
0
28 Nov 2023
F4D: Factorized 4D Convolutional Neural Network for Efficient
  Video-level Representation Learning
F4D: Factorized 4D Convolutional Neural Network for Efficient Video-level Representation LearningInternational Conference on Agents and Artificial Intelligence (ICAART), 2023
Mohammad Al-Saad
Lakshmish Ramaswamy
S. Bhandarkar
AI4TS
150
3
0
28 Nov 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating
  Video-based Large Language Models
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Munan Ning
Bin Zhu
Yujia Xie
Bin Lin
Jiaxi Cui
Lu Yuan
Dongdong Chen
Li-ming Yuan
ELMMLLM
211
91
0
27 Nov 2023
Temporal Action Localization for Inertial-based Human Activity
  Recognition
Temporal Action Localization for Inertial-based Human Activity RecognitionProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2023
Marius Bock
Michael Moeller
Kristof Van Laerhoven
166
6
0
27 Nov 2023
Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation :
  A Unified Approach
Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach
Ayush K. Rai
Tarun Krishna
Feiyan Hu
Alexandru Drimbarean
Kevin McGuinness
Alan F. Smeaton
Noel E. O'Connor
298
11
0
27 Nov 2023
Side4Video: Spatial-Temporal Side Network for Memory-Efficient
  Image-to-Video Transfer Learning
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Huanjin Yao
Wenhao Wu
Zhiheng Li
VLM
303
13
0
27 Nov 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for
  Generalizable Video Action Recognition
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2023
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
287
15
0
27 Nov 2023
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio,
  Video, Point Cloud, Time-Series and Image Recognition
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image RecognitionComputer Vision and Pattern Recognition (CVPR), 2023
Xiaohan Ding
Yiyuan Zhang
Yixiao Ge
Sijie Zhao
Lin Song
Xiangyu Yue
Ying Shan
VLMAI4TSSSL
258
224
0
27 Nov 2023
Mug-STAN: Adapting Image-Language Pretrained Models for General Video
  Understanding
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
267
4
0
25 Nov 2023
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision
  Language Models in Open-Ended Video Question Answering
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question AnsweringEuropean Conference on Computer Vision (ECCV), 2023
Xiuyuan Chen
Yuan Lin
Yuchen Zhang
Weiran Huang
ELMMLLM
307
38
0
25 Nov 2023
Decouple Content and Motion for Conditional Image-to-Video Generation
Decouple Content and Motion for Conditional Image-to-Video GenerationAAAI Conference on Artificial Intelligence (AAAI), 2023
Cuifeng Shen
Yulu Gan
Chen Chen
Xiongwei Zhu
Lele Cheng
Yan Li
Jinzhi Wang
VGenDiffM
217
9
0
24 Nov 2023
Input Compression with Positional Consistency for Efficient Training and
  Inference of Transformer Neural Networks
Input Compression with Positional Consistency for Efficient Training and Inference of Transformer Neural Networks
Amrit Nagarajan
Anand Raghunathan
VLMViT
64
0
0
22 Nov 2023
Quantifying Impairment and Disease Severity Using AI Models Trained on
  Healthy Subjects
Quantifying Impairment and Disease Severity Using AI Models Trained on Healthy Subjects
Boyang Yu
Aakash Kaku
Kangning Liu
A. Parnandi
Emily E Fokas
Anita Venkatesan
Natasha Pandit
Rajesh Ranganath
Heidi M. Schambra
C. Fernandez‐Granda
214
2
0
21 Nov 2023
GLAD: Global-Local View Alignment and Background Debiasing for
  Unsupervised Video Domain Adaptation with Large Domain Gap
GLAD: Global-Local View Alignment and Background Debiasing for Unsupervised Video Domain Adaptation with Large Domain GapIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Hyogun Lee
Kyungho Bae
Seong Jong Ha
Yumin Ko
Gyeong-Moon Park
Jinwoo Choi
220
3
0
21 Nov 2023
Fingerspelling PoseNet: Enhancing Fingerspelling Translation with
  Pose-Based Transformer Models
Fingerspelling PoseNet: Enhancing Fingerspelling Translation with Pose-Based Transformer Models
Pooya Fayyazsanavi
Negar Nejatishahidin
Jana Kosecka
SLR
192
6
0
20 Nov 2023
A Multi-In-Single-Out Network for Video Frame Interpolation without Optical Flow
Jaemin Lee
Min-seok Seo
Sangwoo Lee
Hyobin Park
Dong-Geol Choi
309
0
0
20 Nov 2023
HIDRO-VQA: High Dynamic Range Oracle for Video Quality Assessment
HIDRO-VQA: High Dynamic Range Oracle for Video Quality Assessment
Shreshth Saini
Avinab Saha
A. Bovik
349
8
0
18 Nov 2023
Breaking Temporal Consistency: Generating Video Universal Adversarial
  Perturbations Using Image Models
Breaking Temporal Consistency: Generating Video Universal Adversarial Perturbations Using Image Models
Heeseon Kim
Minji Son
Minbeom Kim
Myung-Joon Kwon
Changick Kim
AAML
267
11
0
17 Nov 2023
JWSign: A Highly Multilingual Corpus of Bible Translations for more
  Diversity in Sign Language Processing
JWSign: A Highly Multilingual Corpus of Bible Translations for more Diversity in Sign Language Processing
Shester Gueuwou
Sophie Siake
Colin Leong
Mathias Müller
SLR
322
21
0
16 Nov 2023
VideoCon: Robust Video-Language Alignment via Contrast Captions
VideoCon: Robust Video-Language Alignment via Contrast CaptionsComputer Vision and Pattern Recognition (CVPR), 2023
Hritik Bansal
Yonatan Bitton
Idan Szpektor
Kai-Wei Chang
Aditya Grover
137
28
0
15 Nov 2023
CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level
  Semantic Information related to Human Feelings
CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level Semantic Information related to Human FeelingsACM Multimedia (ACM MM), 2023
Yachun Mi
Yu Li
Yan Shu
Chen Hui
Puchao Zhou
Gangyan Zeng
177
10
0
13 Nov 2023
PECoP: Parameter Efficient Continual Pretraining for Action Quality
  Assessment
PECoP: Parameter Efficient Continual Pretraining for Action Quality AssessmentIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Amirhossein Dadashzadeh
Shuchao Duan
Alan Whone
Majid Mirmehdi
238
22
0
11 Nov 2023
PolyMaX: General Dense Prediction with Mask Transformer
PolyMaX: General Dense Prediction with Mask TransformerIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Xuan S. Yang
Liangzhe Yuan
Kimberly Wilber
Astuti Sharma
Xiuye Gu
...
Stephanie Debats
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Liang-Chieh Chen
293
24
0
09 Nov 2023
CLearViD: Curriculum Learning for Video Description
CLearViD: Curriculum Learning for Video Description
Cheng-Yu Chuang
Pooyan Fazli
152
1
0
08 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
OmniVec: Learning robust representations with cross modal sharingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Siddharth Srivastava
Gaurav Sharma
SSL
288
83
0
07 Nov 2023
ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life
  Videos
ACQUIRED: A Dataset for Answering Counterfactual Questions In Real-Life VideosConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Te-Lin Wu
Zi-Yi Dou
Qingyuan Hu
Yu Hou
Nischal Reddy Chandra
Marjorie Freedman
R. Weischedel
Nanyun Peng
282
9
0
02 Nov 2023
POS: A Prompts Optimization Suite for Augmenting Text-to-Video
  Generation
POS: A Prompts Optimization Suite for Augmenting Text-to-Video Generation
Shijie Ma
Huayi Xu
Mengjian Li
Weidong Geng
Yaxiong Wang
Meng Wang
DiffMVGen
169
1
0
02 Nov 2023
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology LabNeural Information Processing Systems (NeurIPS), 2023
Jieming Cui
Ziren Gong
Baoxiong Jia
Siyuan Huang
Zilong Zheng
Jianzhu Ma
Yixin Zhu
212
4
0
01 Nov 2023
Previous
123...111213...424344
Next