The Kinetics Human Action Video Dataset

19 May 2017

Sudheendra Vijayanarasimhan

Papers citing "The Kinetics Human Action Video Dataset"

50 / 2,153 papers shown

Video-Bench: Human-Aligned Video Generation BenchmarkComputer Vision and Pattern Recognition (CVPR), 2025

...

587

07 Apr 2025

SnapPix: Efficient-Coding--Inspired In-Sensor Compression for Edge VisionDesign Automation Conference (DAC), 2025

146

06 Apr 2025

3D Scene Understanding Through Local Random Access Sequence Modeling

248

04 Apr 2025

SocialGesture: Delving into Multi-person Gesture UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025

230

03 Apr 2025

Multifaceted Evaluation of Audio-Visual Capability for MLLMs: Effectiveness, Efficiency, Generalizability and Robustness

337

03 Apr 2025

UniViTAR: Unified Vision Transformer with Native Resolution

486

02 Apr 2025

Learning from Streaming Video with Orthogonal GradientsComputer Vision and Pattern Recognition (CVPR), 2025

278

02 Apr 2025

SMILE: Infusing Spatial and Motion Semantics in Masked Video LearningComputer Vision and Pattern Recognition (CVPR), 2025

346

01 Apr 2025

Sample-level Adaptive Knowledge Distillation for Action Recognition

331

01 Apr 2025

Fair Dynamic Spectrum Access via Fully Decentralized Multi-Agent Reinforcement LearningInternational Symposium on Modeling and Optimization in Mobile, Ad-Hoc and Wireless Networks (WiOpt), 2025

281

31 Mar 2025

CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition

533

30 Mar 2025

Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired UsersAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Antonia Karamolegkou

Malvina Nikandrou

Georgios Pantazopoulos

Danae Sanchez Villegas

235

28 Mar 2025

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

Abdelrahman M. Shaker

929

27 Mar 2025

Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video BenchmarksComputer Vision and Pattern Recognition (CVPR), 2025

268

24 Mar 2025

Adaptive Unimodal Regulation for Balanced Multimodal Information AcquisitionComputer Vision and Pattern Recognition (CVPR), 2025

291

24 Mar 2025

ATARS: An Aerial Traffic Atomic Activity Recognition and Temporal Segmentation Dataset

237

24 Mar 2025

Temporal Action Detection Model Compression by Progressive Block DropComputer Vision and Pattern Recognition (CVPR), 2025

302

21 Mar 2025

Structured-Noise Masked Modeling for Video, Audio and Beyond

320

20 Mar 2025

MASH-VLM: Mitigating Action-Scene Hallucination in Video-LLMs through Disentangled Spatial-Temporal RepresentationsComputer Vision and Pattern Recognition (CVPR), 2025

298

20 Mar 2025

FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding

362

19 Mar 2025

Efficient Motion-Aware Video MLLMComputer Vision and Pattern Recognition (CVPR), 2025

257

17 Mar 2025

Action tube generation by person query matching for spatio-temporal action detection

Kazuki Omi

Jion Oshima

Toru Tamaki

376

17 Mar 2025

Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition

332

17 Mar 2025

VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining

352

16 Mar 2025

Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction

319

14 Mar 2025

KVQ: Boosting Video Quality Assessment via Saliency-guided Local PerceptionComputer Vision and Pattern Recognition (CVPR), 2025

378

13 Mar 2025

4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025

429

13 Mar 2025

STEAD: Spatio-Temporal Efficient Anomaly Detection for Time and Compute Sensitive Applications

Andrew Gao

Jun Liu

AI4TS

212

11 Mar 2025

HERO: Human Reaction Generation from Videos

319

11 Mar 2025

TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos

316

09 Mar 2025

End-to-End Action Segmentation Transformer

Tieqiao Wang

Sinisa Todorovic

ViT

292

08 Mar 2025

Secure On-Device Video OOD Detection Without Backpropagation

294

08 Mar 2025

Exploring Simple Siamese Network for High-Resolution Video Quality AssessmentIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025

185

04 Mar 2025

Semi-Supervised Audio-Visual Video Action Recognition with Audio Source Localization Guided Mixup

Seokun Kang

Taehwan Kim

272

04 Mar 2025

Attention Bootstrapping for Multi-Modal Test-Time AdaptationAAAI Conference on Artificial Intelligence (AAAI), 2025

298

04 Mar 2025

HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal SynchronizationComputer Vision and Pattern Recognition (CVPR), 2025

430

03 Mar 2025

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation LearningInternational Conference on Learning Representations (ICLR), 2025

...

274

02 Mar 2025

AgroLLM: Connecting Farmers and Agricultural Practices through Large Language Models for Enhanced Knowledge Transfer and Practical Application

Dinesh Jackson Samuel

Inna Skarga-Bandurova

David Sikolia

Muhammad Awais

270

28 Feb 2025

The PanAf-FGBG Dataset: Understanding the Impact of Backgrounds in Wildlife Behaviour RecognitionComputer Vision and Pattern Recognition (CVPR), 2025

...

317

28 Feb 2025

Two-Stream Spatial-Temporal Transformer Framework for Person Identification via Natural Conversational Keypoints

190

28 Feb 2025

Subtask-Aware Visual Reward Learning from Segmented DemonstrationsInternational Conference on Learning Representations (ICLR), 2025

235

28 Feb 2025

Learning to Generalize without Bias for Open-Vocabulary Action Recognition

324

27 Feb 2025

OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection

...

Juan Carlos León Alcázar

297

27 Feb 2025

Balanced Representation Learning for Long-tailed Skeleton-based Action RecognitionMachine Intelligence Research (MIR), 2023

284

24 Feb 2025

Multi-Dimensional Quality Assessment for Text-to-3D Assets: Dataset and ModelIEEE transactions on multimedia (TMM), 2025

151

24 Feb 2025

Fine-Grained Captioning of Long Videos through Scene Graph Consolidation

Sanghyeok Chu

Seonguk Seo

Bohyung Han

604

23 Feb 2025

Black Sheep in the Herd: Playing with Spuriously Correlated Attributes for Vision-Language RecognitionInternational Conference on Learning Representations (ICLR), 2025

309

19 Feb 2025

MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation

701

18 Feb 2025

EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the WildNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

164

17 Feb 2025

Improving action segmentation via explicit similarity measurement

268

15 Feb 2025