ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.00214
  4. Cited By
The AVA-Kinetics Localized Human Actions Video Dataset
v1v2 (latest)

The AVA-Kinetics Localized Human Actions Video Dataset

1 May 2020
Ang Li
Meghana Thotakuri
David A. Ross
João Carreira
Alexander Vostrikov
Andrew Zisserman
    VGen
ArXiv (abs)PDFHTML

Papers citing "The AVA-Kinetics Localized Human Actions Video Dataset"

50 / 74 papers shown
Title
Video Understanding by Design: How Datasets Shape Architectures and Insights
Video Understanding by Design: How Datasets Shape Architectures and Insights
Lei Wang
Piotr Koniusz
Yongsheng Gao
3DVVGenAI4TS
145
0
0
11 Sep 2025
DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection
DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection
Marcel Klemt
Carlotta Segna
Anna Rohrbach
128
0
0
06 Jun 2025
The Role of Video Generation in Enhancing Data-Limited Action Understanding
The Role of Video Generation in Enhancing Data-Limited Action UnderstandingInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Wei Li
Dezhao Luo
Dongbao Yang
Zhenhang Li
Weiping Wang
Yu Zhou
DiffMVGen
444
0
0
26 May 2025
Post-processing for Fair Regression via Explainable SVD
Post-processing for Fair Regression via Explainable SVDInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2025
Zhiqun Zuo
Ding Zhu
Mohammad Mahdi Khalili
735
1
0
04 Apr 2025
TRACE: Real-Time Multimodal Common Ground Tracking in Situated Collaborative Dialogues
Hannah VanderHoeven
Brady Bhalla
Ibrahim Khebour
Austin Youngren
Videep Venkatesha
...
Yifan Zhu
Kenneth Lai
Changsoo Jung
James Pustejovsky
Nikhil Krishnaswamy
108
6
0
12 Mar 2025
PromptGAR: Flexible Promptive Group Activity Recognition
PromptGAR: Flexible Promptive Group Activity Recognition
Zhangyu Jin
Andrew Feng
Ankur Chemburkar
Celso M. De Melo
VLM
204
0
0
11 Mar 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
215
0
0
11 Feb 2025
Human Activity Recognition in an Open World
Human Activity Recognition in an Open WorldJournal of Artificial Intelligence Research (JAIR), 2022
D. Prijatelj
Samuel Grieggs
Jin Huang
Dawei Du
Ameya Shringi
Christopher Funk
Adam Kaufman
Eric Robertson
Walter J. Scheirer University of Notre Dame
245
4
0
17 Jan 2025
Visual WetlandBirds Dataset: Bird Species Identification and Behavior Recognition in Videos
Visual WetlandBirds Dataset: Bird Species Identification and Behavior Recognition in VideosScientific Data (Sci Data), 2025
Javier Rodriguez-Juan
David Ortiz-Perez
Manuel Benavent-Lledo
David Mulero-Pérez
Pablo Ruiz-Ponce
Adrian Orihuela-Torres
José García Rodríguez
Esther Sebastián-González
86
5
0
15 Jan 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token MarksComputer Vision and Pattern Recognition (CVPR), 2025
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjDVLM
432
6
0
14 Jan 2025
Technical Report for ActivityNet Challenge 2022 -- Temporal Action
  Localization
Technical Report for ActivityNet Challenge 2022 -- Temporal Action Localization
Shimin Chen
Wei Li
Jianyang Gu
Chong Chen
Yandong Guo
124
0
0
31 Oct 2024
Towards Student Actions in Classroom Scenes: New Dataset and Baseline
Towards Student Actions in Classroom Scenes: New Dataset and BaselineIEEE transactions on multimedia (IEEE TMM), 2024
Zhuolin Tan
Chenqiang Gao
Anyong Qin
Ruixin Chen
Tiecheng Song
Feng Yang
Deyu Meng
171
1
0
02 Sep 2024
DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task
  prompt learning
DLM-VMTL:A Double Layer Mapper for heterogeneous data video Multi-task prompt learning
Zeyi Bo
Wuxi Sun
Ye Jin
VLM
176
0
0
29 Aug 2024
Masked Image Modeling: A Survey
Masked Image Modeling: A SurveyInternational Journal of Computer Vision (IJCV), 2024
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
Andrii Zadaianchuk
367
15
0
13 Aug 2024
Conformance Checking of Fuzzy Logs against Declarative Temporal
  Specifications
Conformance Checking of Fuzzy Logs against Declarative Temporal Specifications
Ivan Donadello
Paolo Felli
Craig Innes
F. Maggi
Marco Montali
83
5
0
17 Jun 2024
BaboonLand Dataset: Tracking Primates in the Wild and Automating
  Behaviour Recognition from Drone Videos
BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos
Isla Duporge
Maksim Kholiavchenko
Roi Harel
Scott Wolf
Daniel Rubenstein
...
Stephen Lee
Julie Barreau
Jenna Kline
Michelle Ramirez
Charles V. Stewart
136
23
0
27 May 2024
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports
  Videos
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu
Runyu He
Gangshan Wu
Limin Wang
3DH
206
12
0
06 Apr 2024
Common Ground Tracking in Multimodal Dialogue
Common Ground Tracking in Multimodal Dialogue
Ibrahim Khebour
Kenneth Lai
Mariah Bradford
Yifan Zhu
R. Brutti
...
Jingxuan Tu
Benjamin Ibarra
Nathaniel Blanchard
Nikhil Krishnaswamy
James Pustejovsky
107
18
0
26 Mar 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
293
59
0
20 Feb 2024
Computer Vision for Primate Behavior Analysis in the Wild
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
Florentin Wörgötter
Alexander S. Ecker
254
12
0
29 Jan 2024
Pixel-Wise Recognition for Holistic Surgical Scene Understanding
Pixel-Wise Recognition for Holistic Surgical Scene Understanding
Nicolás Ayobi
Santiago Rodríguez
Alejandra Pérez
Isabela Hernández
Nicolás Aparicio
...
Sebastián Pena
J. Santander
J. Caicedo
Nicolás Fernández
Pablo Arbelaez
ViTMedIm
157
27
0
20 Jan 2024
Multiscale Vision Transformers meet Bipartite Matching for efficient
  single-stage Action Localization
Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action LocalizationComputer Vision and Pattern Recognition (CVPR), 2023
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
160
6
0
29 Dec 2023
Student Classroom Behavior Detection based on Spatio-Temporal Network
  and Multi-Model Fusion
Student Classroom Behavior Detection based on Spatio-Temporal Network and Multi-Model Fusion
Fan Yang
Xiaofei Wang
186
2
0
25 Oct 2023
SCB-Dataset3: A Benchmark for Detecting Student Classroom Behavior
SCB-Dataset3: A Benchmark for Detecting Student Classroom Behavior
Fan Yang
Tao Wang
82
28
0
04 Oct 2023
Accurate and Fast Compressed Video Captioning
Accurate and Fast Compressed Video CaptioningIEEE International Conference on Computer Vision (ICCV), 2023
Yaojie Shen
Xin Gu
Kai Xu
Hengrui Fan
Longyin Wen
Libo Zhang
ViT
116
40
0
22 Sep 2023
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Haodong Duan
Mingze Xu
Bing Shuai
Davide Modolo
Zhuowen Tu
Joseph Tighe
Alessandro Bergamo
ViT
149
1
0
20 Sep 2023
Joint learning of images and videos with a single Vision Transformer
Joint learning of images and videos with a single Vision Transformer
Shuki Shimizu
Toru Tamaki
ViT
109
0
0
21 Aug 2023
Audiovisual Moments in Time: A Large-Scale Annotated Dataset of
  Audiovisual Actions
Audiovisual Moments in Time: A Large-Scale Annotated Dataset of Audiovisual ActionsPLoS ONE (PLoS ONE), 2023
Michael Joannou
P. Rotshtein
U. Noppeney
121
0
0
18 Aug 2023
VideoGLUE: Video General Understanding Evaluation of Foundation Models
VideoGLUE: Video General Understanding Evaluation of Foundation Models
Liangzhe Yuan
N. B. Gundavarapu
Long Zhao
Hao Zhou
Huayu Chen
...
Florian Schroff
Hartwig Adam
Ming-Hsuan Yang
Ting Liu
Boqing Gong
ELM
133
14
0
06 Jul 2023
End-to-End Spatio-Temporal Action Localisation with Video Transformers
End-to-End Spatio-Temporal Action Localisation with Video TransformersComputer Vision and Pattern Recognition (CVPR), 2023
A. Gritsenko
Xuehan Xiong
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
Anurag Arnab
ViT
159
19
0
24 Apr 2023
On the Benefits of 3D Pose and Tracking for Human Action Recognition
On the Benefits of 3D Pose and Tracking for Human Action RecognitionComputer Vision and Pattern Recognition (CVPR), 2023
Jathushan Rajasegaran
Georgios Pavlakos
Angjoo Kanazawa
Christoph Feichtenhofer
Jitendra Malik
274
42
0
03 Apr 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in
  Untrimmed Multi-Action Videos from Narrated Instructions
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated InstructionsComputer Vision and Pattern Recognition (CVPR), 2023
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
195
9
0
29 Mar 2023
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
VideoMAE V2: Scaling Video Masked Autoencoders with Dual MaskingComputer Vision and Pattern Recognition (CVPR), 2023
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
265
474
0
29 Mar 2023
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action
  Detection
CycleACR: Cycle Modeling of Actor-Context Relations for Video Action DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
99
3
0
28 Mar 2023
Baseline Method for the Sport Task of MediaEval 2022 with 3D CNNs using
  Attention Mechanisms
Baseline Method for the Sport Task of MediaEval 2022 with 3D CNNs using Attention MechanismsMediaEval Benchmarking Initiative for Multimedia Evaluation (MediaEval), 2023
Pierre-Etienne Martin
100
2
0
06 Feb 2023
Sport Task: Fine Grained Action Detection and Classification of Table
  Tennis Strokes from Videos for MediaEval 2022
Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022MediaEval Benchmarking Initiative for Multimedia Evaluation (MediaEval), 2023
Pierre-Etienne Martin
J. Calandre
Boris Mansencal
J. Benois-Pineau
Renaud Péteri
L. Mascarilla
J. Morlier
94
4
0
31 Jan 2023
Building Scalable Video Understanding Benchmarks through Sports
Building Scalable Video Understanding Benchmarks through Sports
Aniket Agarwal
Alex Zhang
Karthik Narasimhan
Igor Gilitschenski
Vishvak Murahari
Yash Kant
121
2
0
17 Jan 2023
InternVideo: General Video Foundation Models via Generative and
  Discriminative Learning
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLMVGen
273
422
0
06 Dec 2022
Baby Physical Safety Monitoring in Smart Home Using Action Recognition
  System
Baby Physical Safety Monitoring in Smart Home Using Action Recognition SystemSoutheastCon (SoutheastCon), 2022
Victor A. Adewopo
Nelly Elsayed
Kelly Anderson
135
7
0
22 Oct 2022
Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain
  Supervision for Domain-adaptive Action Detection
Exploiting Instance-based Mixed Sampling via Auxiliary Source Domain Supervision for Domain-adaptive Action DetectionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Yifan Lu
Gurkirt Singh
Suman Saha
Luc Van Gool
TTA
202
3
0
28 Sep 2022
EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations
EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object RelationsNeural Information Processing Systems (NeurIPS), 2022
Ahmad Darkhalil
Dandan Shan
Bin Zhu
Jian Ma
Amlan Kar
Richard E. L. Higgins
Sanja Fidler
David Fouhey
Dima Damen
VOS
180
124
0
26 Sep 2022
Actor-identified Spatiotemporal Action Detection -- Detecting Who Is
  Doing What in Videos
Actor-identified Spatiotemporal Action Detection -- Detecting Who Is Doing What in Videos
Fan Yang
Norimichi Ukita
S. Sakti
Satoshi Nakamura
171
0
0
27 Aug 2022
Spotting Temporally Precise, Fine-Grained Events in Video
Spotting Temporally Precise, Fine-Grained Events in VideoEuropean Conference on Computer Vision (ECCV), 2022
James Hong
Haotian Zhang
Michael Gharbi
Matthew Fisher
Kayvon Fatahalian
190
47
0
20 Jul 2022
Fine-grained Activities of People Worldwide
Fine-grained Activities of People WorldwideIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
J. Byrne
Greg Castañón
Zhongheng Li
G. Ettinger
135
5
0
11 Jul 2022
Beyond Transfer Learning: Co-finetuning for Action Localisation
Beyond Transfer Learning: Co-finetuning for Action Localisation
Anurag Arnab
Xuehan Xiong
A. Gritsenko
Rob Romijnders
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
138
10
0
08 Jul 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A SurveyACM Computing Surveys (ACM CSUR), 2022
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
318
161
0
18 Jun 2022
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal
  Action Detector
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action DetectorIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Lin Sui
Chen-Da Liu-Zhang
Lixin Gu
Feng Han
167
12
0
07 Jun 2022
A Survey on Video Action Recognition in Sports: Datasets, Methods and
  Applications
A Survey on Video Action Recognition in Sports: Datasets, Methods and ApplicationsIEEE transactions on multimedia (IEEE TMM), 2022
Fei Wu
Qingzhong Wang
Jian Bian
Haoyi Xiong
Ning Ding
Feixiang Lu
Junqing Cheng
Dejing Dou
AI4TS
167
75
0
02 Jun 2022
Recent Advances in Embedding Methods for Multi-Object Tracking: A Survey
Recent Advances in Embedding Methods for Multi-Object Tracking: A Survey
Gaoang Wang
Xiuming Zhang
Lei Li
VOT
262
20
0
22 May 2022
3D Convolutional Networks for Action Recognition: Application to Sport
  Gesture Recognition
3D Convolutional Networks for Action Recognition: Application to Sport Gesture Recognition
Pierre-Etienne Martin
J. Benois-Pineau
Renaud Péteri
A. Zemmari
J. Morlier
113
5
0
13 Apr 2022
12
Next