Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.08421
Cited By
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
23 May 2017
Chunhui Gu
Chen Sun
David A. Ross
Carl Vondrick
C. Pantofaru
Yeqing Li
Sudheendra Vijayanarasimhan
G. Toderici
Susanna Ricco
Rahul Sukthankar
Cordelia Schmid
Jitendra Malik
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions"
50 / 176 papers shown
Title
CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition
Quynh Phung
Long Mai
Fabian Caba Heilbron
Feng Liu
Jia-Bin Huang
Cusuh Ham
DiffM
VGen
CoGe
108
0
0
28 Apr 2025
Post-processing for Fair Regression via Explainable SVD
Zhiqun Zuo
Ding Zhu
Mohammad Mahdi Khalili
146
0
0
04 Apr 2025
Action tube generation by person query matching for spatio-temporal action detection
Kazuki Omi
Jion Oshima
Toru Tamaki
60
0
0
17 Mar 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis
Amir Hosein Fadaei
M. Dehaqani
42
0
0
11 Feb 2025
Interacted Object Grounding in Spatio-Temporal Human-Object Interactions
Xiaoyang Liu
Boran Wen
Xinpeng Liu
Zizheng Zhou
Hongwei Fan
Cewu Lu
Lizhuang Ma
Yulong Chen
Y. Li
56
2
0
27 Dec 2024
Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric
Zhichao Zhang
Wei Sun
Xinyue Li
Yunhao Li
Qihang Ge
...
Zhongpeng Ji
Fengyu Sun
Shangling Jui
Xiongkuo Min
Guangtao Zhai
EGVM
117
1
0
25 Nov 2024
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao
Gen Li
Shreyank N. Gowda
Robert B Fisher
Jonathan Huang
Anurag Arnab
Laura Sevilla-Lara
98
0
0
20 Nov 2024
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Andong Deng
Tongjia Chen
Shoubin Yu
Taojiannan Yang
Lincoln Spencer
Yapeng Tian
Ajmal Saeed Mian
Mohit Bansal
Chen Chen
LRM
59
1
0
15 Nov 2024
Situational Scene Graph for Structured Human-centric Situation Understanding
Chinthani Sugandhika
Chen Li
Deepu Rajan
Basura Fernando
140
1
0
30 Oct 2024
Query matching for spatio-temporal action detection with query-based object detector
Shimon Hori
Kazuki Omi
Toru Tamaki
31
0
0
27 Sep 2024
Towards Student Actions in Classroom Scenes: New Dataset and Baseline
Zhuolin Tan
Chenqiang Gao
Anyong Qin
Ruixin Chen
Tiecheng Song
Feng Yang
Deyu Meng
29
0
0
02 Sep 2024
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation
Tz-Ying Wu
Kyle Min
Subarna Tripathi
Nuno Vasconcelos
EgoV
55
0
0
28 Jul 2024
Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective
Zeen Song
Jingyao Wang
Jianqi Zhang
Changwen Zheng
Wenwen Qiang
SSL
56
0
0
19 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description Models
Jiawei Wang
Liping Yuan
Yuchen Zhang
38
52
0
30 Jun 2024
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
34
0
0
11 Jun 2024
InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech Activity Detection and Speaker Gender Segmentation
D. Doukhan
Christine Maertens
William Le Personnic
Ludovic Speroni
Reda Dehak
30
2
0
06 Jun 2024
EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos
Masashi Hatano
Ryo Hachiuma
Hideo Saito
EgoV
31
3
0
30 May 2024
SkelCap: Automated Generation of Descriptive Text from Skeleton Keypoint Sequences
Ali Emre Keskin
H. Keles
SLR
33
0
0
05 May 2024
TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation
Sai Kumar Dwivedi
Yu Sun
Priyanka Patel
Yao Feng
Michael J. Black
3DH
42
27
0
25 Apr 2024
Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis
Masahiro Yasuda
Noboru Harada
Yasunori Ohishi
Shoichiro Saito
Akira Nakayama
Nobutaka Ono
34
3
0
12 Apr 2024
Social-MAE: Social Masked Autoencoder for Multi-person Motion Representation Learning
Mahsa Ehsanpour
Ian Reid
Hamid Rezatofighi
ViT
34
0
0
08 Apr 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
Ahmad A Mahmood
Ashmal Vayani
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
LRM
49
7
0
21 Mar 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
36
29
0
20 Feb 2024
Semi-supervised Active Learning for Video Action Detection
Aayush Singh
A. J. Rana
Akash Kumar
Shruti Vyas
Y. S. Rawat
30
7
0
12 Dec 2023
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
Trong-Thuan Nguyen
Pha Nguyen
Khoa Luu
22
12
0
05 Dec 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Munan Ning
Bin Zhu
Yujia Xie
Bin Lin
Jiaxi Cui
Lu Yuan
Dongdong Chen
Li-ming Yuan
ELM
MLLM
25
58
0
27 Nov 2023
Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video
Shashanka Venkataramanan
Mamshad Nayeem Rizve
João Carreira
Yuki M. Asano
Yannis Avrithis
SSL
29
18
0
12 Oct 2023
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Haodong Duan
Mingze Xu
Bing Shuai
Davide Modolo
Zhuowen Tu
Joseph Tighe
Alessandro Bergamo
ViT
32
1
0
20 Sep 2023
Reconstructing Three-Dimensional Models of Interacting Humans
Mihai Fieraru
M. Zanfir
Elisabeta Oneata
A. Popa
Vlad Olaru
C. Sminchisescu
3DH
21
5
0
03 Aug 2023
A Survey on Deep Learning-based Spatio-temporal Action Detection
Peng Wang
Fanwei Zeng
Yu Qian
26
5
0
03 Aug 2023
ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour
Samy Tafasca
Anshul Gupta
J. Odobez
39
18
0
04 Jul 2023
CVB: A Video Dataset of Cattle Visual Behaviors
Ali Zia
Renuka Sharma
Reza Arablouei
G. Bishop-Hurley
Jody McNally
N. Bagnall
V. Rolland
Brano Kusy
L. Petersson
A. Ingham
23
2
0
26 May 2023
Deep Neural Networks in Video Human Action Recognition: A Review
Zihan Wang
Yang Yang
Zhi Liu
Y. Zheng
53
4
0
25 May 2023
Type-to-Track: Retrieve Any Object via Prompt-based Tracking
Pha Nguyen
Kha Gia Quach
Kris M. Kitani
Khoa Luu
39
18
0
22 May 2023
HICO-DET-SG and V-COCO-SG: New Data Splits for Evaluating the Systematic Generalization Performance of Human-Object Interaction Detection Models
Kenta Takemoto
Moyuru Yamada
Tomotake Sasaki
H. Akima
35
0
0
17 May 2023
End-to-End Spatio-Temporal Action Localisation with Video Transformers
A. Gritsenko
Xuehan Xiong
Josip Djolonga
Mostafa Dehghani
Chen Sun
Mario Lucic
Cordelia Schmid
Anurag Arnab
ViT
32
13
0
24 Apr 2023
Efficient Video Action Detection with Token Dropout and Context Refinement
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
36
14
0
17 Apr 2023
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya
Anurag Arnab
Arsha Nagrani
Michael S. Ryoo
31
19
0
05 Apr 2023
Bodily expressed emotion understanding through integrating Laban movement analysis
Chenyan Wu
Dolzodmaa Davaasuren
T. Shafir
Rachelle Tsachor
James Z. Wang
30
6
0
05 Apr 2023
On the Benefits of 3D Pose and Tracking for Human Action Recognition
Jathushan Rajasegaran
Georgios Pavlakos
Angjoo Kanazawa
Christoph Feichtenhofer
Jitendra Malik
30
30
0
03 Apr 2023
DOAD: Decoupled One Stage Action Detection Network
Shuning Chang
Pichao Wang
Fan Wang
Jiashi Feng
Mike Zheng Show
13
4
0
01 Apr 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
29
7
0
29 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
43
154
0
28 Mar 2023
iBall: Augmenting Basketball Videos with Gaze-moderated Embedded Visualizations
Zhutian Chen
Qisen Yang
Jiarui Shan
Tica Lin
Johanna Beyer
Haijun Xia
Hanspeter Pfister
19
28
0
06 Mar 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
12
7
0
16 Feb 2023
YOWOv2: A Stronger yet Efficient Multi-level Detection Framework for Real-time Spatio-temporal Action Detection
Jianhua Yang
Kun Dai
ObjD
18
17
0
14 Feb 2023
CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection
C. Nwoye
Tong Yu
Saurav Sharma
Aditya Murali
Deepak Alapatt
...
Pietro Mascagni
B. Seeliger
Cristians Gonzalez
Didier Mutter
N. Padoy
30
17
0
13 Feb 2023
Context Understanding in Computer Vision: A Survey
Xuan Wang
Zhigang Zhu
16
45
0
10 Feb 2023
Baseline Method for the Sport Task of MediaEval 2022 with 3D CNNs using Attention Mechanisms
Pierre-Etienne Martin
14
1
0
06 Feb 2023
Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022
Pierre-Etienne Martin
J. Calandre
Boris Mansencal
J. Benois-Pineau
Renaud Péteri
L. Mascarilla
J. Morlier
21
4
0
31 Jan 2023
1
2
3
4
Next