Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1705.06950
Cited By
The Kinetics Human Action Video Dataset
19 May 2017
W. Kay
João Carreira
Karen Simonyan
Brian Zhang
Chloe Hillier
Sudheendra Vijayanarasimhan
Fabio Viola
Tim Green
T. Back
Apostol Natsev
Mustafa Suleyman
Andrew Zisserman
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The Kinetics Human Action Video Dataset"
50 / 2,152 papers shown
Janus: Collaborative Vision Transformer Under Dynamic Network Environment
IEEE Conference on Computer Communications (IEEE INFOCOM), 2025
Linyi Jiang
Silvery Fu
Yifei Zhu
Bo Li
ViT
890
2
0
14 Feb 2025
Learning Human Skill Generators at Key-Step Levels
Yilu Wu
Chenhui Zhu
Shuai Wang
Hanlin Wang
Jing Wang
Zhaoxiang Zhang
Limin Wang
VGen
392
1
0
12 Feb 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
327
0
0
11 Feb 2025
A Survey on Mamba Architecture for Vision Applications
Fady Ibrahim
Guangjun Liu
Guanghui Wang
Mamba
432
9
0
11 Feb 2025
History-Guided Video Diffusion
Kiwhan Song
Boyuan Chen
Max Simchowitz
Yilun Du
Russ Tedrake
Vincent Sitzmann
VGen
554
65
0
10 Feb 2025
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Yueying Zou
Peipei Li
Zekun Li
Huaibo Huang
Xing Cui
Xuannan Liu
Chenghanyu Zhang
Ran He
DeLMO
697
11
0
07 Feb 2025
BRIDLE: Generalized Self-supervised Learning with Quantization
Hoang M. Nguyen
Satya Narayan Shukla
Qiang Zhang
Hanchao Yu
Sreya D. Roy
Taipeng Tian
Lingjiong Zhu
Yuchen Liu
SSL
MQ
329
0
0
04 Feb 2025
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation
Yang Cao
Zhao Song
Chiwun Yang
VGen
511
11
0
01 Feb 2025
Minimalistic Video Saliency Prediction via Efficient Decoder & Spatio Temporal Action Cues
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Rohit Girmaji
Siddharth Jain
Bhav Beri
Sarthak Bansal
Vineet Gandhi
ViT
207
4
0
01 Feb 2025
Can masking background and object reduce static bias for zero-shot action recognition?
Conference on Multimedia Modeling (MMM), 2025
Takumi Fukuzawa
Kensho Hara
Hirokatsu Kataoka
Toru Tamaki
450
4
0
22 Jan 2025
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
Yi Wang
Xinhao Li
Ziang Yan
Yinan He
Jiashuo Yu
...
Kai Chen
Wenhai Wang
Yu Qiao
Yali Wang
Limin Wang
543
120
0
21 Jan 2025
Human Activity Recognition in an Open World
Journal of Artificial Intelligence Research (JAIR), 2022
D. Prijatelj
Samuel Grieggs
Jin Huang
Dawei Du
Ameya Shringi
Christopher Funk
Adam Kaufman
Eric Robertson
Walter J. Scheirer University of Notre Dame
390
4
0
17 Jan 2025
A Multimodal Dataset for Enhancing Industrial Task Monitoring and Engagement Prediction
IEEE/ACM International Conference on Human-Robot Interaction (HRI), 2025
Naval Kishore Mehta
Arvind
Himanshu Kumar
Abeer Banerjee
Sumeet Saurav
Sanjay Singh
199
0
0
10 Jan 2025
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos
Neural Information Processing Systems (NeurIPS), 2024
Luigi Seminara
G. Farinella
Antonino Furnari
488
21
0
10 Jan 2025
MS-Temba: Multi-Scale Temporal Mamba for Understanding Long Untrimmed Videos
Arkaprava Sinha
Monish Soundar Raj
Pu Wang
Ahmed Helmy
Srijan Das
Srijan Das
Mamba
577
5
0
10 Jan 2025
OV-HHIR: Open Vocabulary Human Interaction Recognition Using Cross-modal Integration of Large Language Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
L. Ray
Bo Zhou
Sungho Suh
P. Lukowicz
VLM
151
0
0
03 Jan 2025
MLVU: Benchmarking Multi-task Long Video Understanding
Computer Vision and Pattern Recognition (CVPR), 2024
Yueze Wang
Yan Shu
Bo Zhao
Boya Wu
Junjie Zhou
...
Xi Yang
Y. Xiong
Bo Zhang
Tiejun Huang
Zheng Liu
VLM
539
11
0
03 Jan 2025
Cross-Modal Fusion and Attention Mechanism for Weakly Supervised Video Anomaly Detection
Ayush Ghadiya
P. Kar
Vishal M. Chudasama
Pankaj Wasnik
355
18
0
31 Dec 2024
GFG -- Gender-Fair Generation: A CALAMITA Challenge
Simona Frenda
Andrea Piergentili
Beatrice Savoldi
Marco Madeddu
Martina Rosola
Silvia Casola
Chiara Ferrando
V. Patti
Matteo Negri
L. Bentivogli
313
12
0
31 Dec 2024
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Xinhao Li
Yi Wang
Jiashuo Yu
Xiangyu Zeng
Yuhan Zhu
...
Yinan He
Chenting Wang
Yu Qiao
Yali Wang
L. Wang
VLM
863
108
0
31 Dec 2024
Finger in Camera Speaks Everything: Unconstrained Air-Writing for Real-World
Meiqi Wu
Kaiqi Huang
Yuanqiang Cai
Shiyu Hu
Yuzhong Zhao
Weiqiang Wang
VGen
210
7
0
27 Dec 2024
Sensitive Image Classification by Vision Transformers
IEEE International Conference on Systems, Man and Cybernetics (SMC), 2024
Hanxian He
Campbell Wilson
Thanh Thi Nguyen
Janis Dalins
ViT
320
1
0
21 Dec 2024
LEARN: A Unified Framework for Multi-Task Domain Adapt Few-Shot Learning
Bharadwaj Ravichandran
Alexander Lynch
S. Brockman
Brandon RichardWebster
Dawei Du
A. Hoogs
Christopher Funk
ObjD
VLM
394
0
0
20 Dec 2024
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioning
AAAI Conference on Artificial Intelligence (AAAI), 2024
Yunbin Tu
Liang-Sheng Li
Li Su
Qingming Huang
298
1
0
18 Dec 2024
Do Language Models Understand Time?
The Web Conference (WWW), 2024
Xi Ding
Lei Wang
941
10
0
18 Dec 2024
JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts
AAAI Conference on Artificial Intelligence (AAAI), 2024
Taein Son
Soo Won Seo
Jisong Kim
S. Lee
Jun Won Choi
VGen
367
2
0
18 Dec 2024
Move-in-2D: 2D-Conditioned Human Motion Generation
Computer Vision and Pattern Recognition (CVPR), 2024
Hsin-Ping Huang
Yang Zhou
Jui-Hsien Wang
Difan Liu
Feng Liu
Ming-Hsuan Yang
Zhan Xu
VGen
DiffM
195
4
0
17 Dec 2024
Gramian Multimodal Representation Learning and Alignment
International Conference on Learning Representations (ICLR), 2024
Giordano Cicchetti
Eleonora Grassucci
Luigi Sigillo
Danilo Comminiello
463
26
0
16 Dec 2024
Training Strategies for Isolated Sign Language Recognition
Journal of WSCG (WSCG), 2024
Karina Kvanchiani
Roman Kraynov
Elizaveta Petrova
Petr Surovcev
Aleksandr Nagaev
A. Kapitanov
439
1
0
16 Dec 2024
Uni-AdaFocus: Spatial-temporal Dynamic Computation for Video Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yulin Wang
Haoji Zhang
Yang Yue
Shiji Song
Chao Deng
Junlan Feng
Gao Huang
286
12
0
15 Dec 2024
Repetitive Action Counting with Hybrid Temporal Relation Modeling
IEEE transactions on multimedia (IEEE TMM), 2024
Kun Li
Xinge Peng
Dan Guo
Xun Yang
Meng Wang
243
29
0
10 Dec 2024
Policy-shaped prediction: avoiding distractions in model-based reinforcement learning
Neural Information Processing Systems (NeurIPS), 2024
Miles Hutson
Isaac Kauvar
Nick Haber
323
1
0
08 Dec 2024
Reinforcement Learning from Wild Animal Videos
Elliot Chane-Sane
Constant Roux
O. Stasse
Nicolas Mansard
952
1
0
05 Dec 2024
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Kaixiong Gong
Kaituo Feng
Yangqiu Song
Yibing Wang
Mofan Cheng
...
Jiaming Han
Benyou Wang
Yutong Bai
Zhiyong Yang
Xiangyu Yue
MLLM
AuLLM
VLM
279
25
0
03 Dec 2024
SEAL: Semantic Attention Learning for Long Video Representation
Computer Vision and Pattern Recognition (CVPR), 2024
Lan Wang
Yujia Chen
Wen-Sheng Chu
Vishnu Boddeti
Du Tran
VLM
635
7
0
02 Dec 2024
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
Dhiman Paul
Md Rizwan Parvez
Nabeel Mohammed
Shafin Rahman
VGen
278
4
0
02 Dec 2024
OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions
Guanyu Zhou
Xiaohan Yu
Wenxin Huang
Xuemei Jia
Zhuo Zhou
Chia-Wen Lin
CML
399
1
0
24 Nov 2024
KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Maheswar Bora
Saurabh Atreya
Aritra Mukherjee
Abhijit Das
313
0
0
19 Nov 2024
LaVin-DiT: Large Vision Diffusion Transformer
Computer Vision and Pattern Recognition (CVPR), 2024
Zhaoqing Wang
Xiaobo Xia
Runnan Chen
Dongdong Yu
Changhu Wang
Mingming Gong
Tongliang Liu
553
19
0
18 Nov 2024
Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition
Jeonghyeok Do
Munchurl Kim
603
1
0
16 Nov 2024
DiMoDif: Discourse Modality-information Differentiation for Audio-visual Deepfake Detection and Localization
C. Koutlis
Symeon Papadopoulos
424
7
0
15 Nov 2024
Weakly-Supervised Anomaly Detection in Surveillance Videos Based on Two-Stream I3D Convolution Network
Sareh Nejad
Anwar Haque
181
6
0
13 Nov 2024
Pay Attention to the Keys: Visual Piano Transcription Using Transformers
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Uros Zivanovic
Ivan Pilkov
Carlos Eduardo Cancino-Chacón
ViT
178
0
0
13 Nov 2024
Multi-Modal interpretable automatic video captioning
Antoine Hanna-Asaad
Decky Aspandi
Titus Zaharia
258
1
0
11 Nov 2024
Balancing Multimodal Training Through Game-Theoretic Regularization
Konstantinos Kontras
Thomas Strypsteen
Christos Chatzichristos
Paul P. Liang
Matthew Blaschko
M. D. Vos
398
3
0
11 Nov 2024
Extended multi-stream temporal-attention module for skeleton-based human action recognition (HAR)
Computers in Human Behavior (CHB), 2024
Faisal Mehmood
Xin Guo
Enqing Chen
Muhammad Azeem Akbar
A. Khan
Sami Ullah
331
9
0
10 Nov 2024
Improved Video VAE for Latent Video Diffusion Model
Computer Vision and Pattern Recognition (CVPR), 2024
Pingyu Wu
Kai Zhu
Yu Liu
Liming Zhao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
VGen
DiffM
175
18
0
10 Nov 2024
CityGuessr: City-Level Video Geo-Localization on a Global Scale
European Conference on Computer Vision (ECCV), 2024
P. Kulkarni
Gaurav Kumar Nayak
Mubarak Shah
ViT
AI4TS
192
9
0
10 Nov 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Neural Information Processing Systems (NeurIPS), 2024
Rohan Choudhury
Guanglei Zhu
Sihan Liu
Koichiro Niinuma
Kishore Venkateshan
László A. Jeni
249
27
0
07 Nov 2024
HourVideo: 1-Hour Video-Language Understanding
Neural Information Processing Systems (NeurIPS), 2024
Keshigeyan Chandrasegaran
Agrim Gupta
Lea M. Hadzic
Taran Kota
Jimming He
Cristobal Eyzaguirre
Zane Durante
Pengfei Yu
Jiajun Wu
L. Fei-Fei
VLM
290
83
0
07 Nov 2024
Previous
1
2
3
4
5
6
...
42
43
44
Next