Title
CineVerse: Consistent Keyframe Synthesis for Cinematic Scene Composition Quynh Phung Long Mai Fabian Caba Heilbron Feng Liu Jia-Bin Huang Cusuh Ham DiffM VGen CoGe 108 0 0 28 Apr 2025
Post-processing for Fair Regression via Explainable SVD Zhiqun Zuo Ding Zhu Mohammad Mahdi Khalili 146 0 0 04 Apr 2025
Action tube generation by person query matching for spatio-temporal action detection Kazuki Omi Jion Oshima Toru Tamaki 60 0 0 17 Mar 2025
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis Amir Hosein Fadaei M. Dehaqani 42 0 0 11 Feb 2025
Interacted Object Grounding in Spatio-Temporal Human-Object Interactions Xiaoyang Liu Boran Wen Xinpeng Liu Zizheng Zhou Hongwei Fan Cewu Lu Lizhuang Ma Yulong Chen Y. Li 56 2 0 27 Dec 2024
Human-Activity AGV Quality Assessment: A Benchmark Dataset and an Objective Evaluation Metric Zhichao Zhang Wei Sun Xinyue Li Yunhao Li Qihang Ge ... Zhongpeng Ji Fengyu Sun Shangling Jui Xiongkuo Min Guangtao Zhai EGVM 117 1 0 25 Nov 2024
Principles of Visual Tokens for Efficient Video Understanding Xinyue Hao Gen Li Shreyank N. Gowda Robert B Fisher Jonathan Huang Anurag Arnab Laura Sevilla-Lara 98 0 0 20 Nov 2024
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level Andong Deng Tongjia Chen Shoubin Yu Taojiannan Yang Lincoln Spencer Yapeng Tian Ajmal Saeed Mian Mohit Bansal Chen Chen LRM 59 1 0 15 Nov 2024
Situational Scene Graph for Structured Human-centric Situation Understanding Chinthani Sugandhika Chen Li Deepu Rajan Basura Fernando 140 1 0 30 Oct 2024
Query matching for spatio-temporal action detection with query-based object detector Shimon Hori Kazuki Omi Toru Tamaki 31 0 0 27 Sep 2024
Towards Student Actions in Classroom Scenes: New Dataset and Baseline Zhuolin Tan Chenqiang Gao Anyong Qin Ruixin Chen Tiecheng Song Feng Yang Deyu Meng 29 0 0 02 Sep 2024
Ego-VPA: Egocentric Video Understanding with Parameter-efficient Adaptation Tz-Ying Wu Kyle Min Subarna Tripathi Nuno Vasconcelos EgoV 55 0 0 28 Jul 2024
Self-Supervised Video Representation Learning in a Heuristic Decoupled Perspective Zeen Song Jingyao Wang Jianqi Zhang Changwen Zheng Wenwen Qiang SSL 56 0 0 19 Jul 2024
Tarsier: Recipes for Training and Evaluating Large Video Description Models Jiawei Wang Liping Yuan Yuchen Zhang 38 52 0 30 Jun 2024
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD Ioanna Ntinou Enrique Sanchez Georgios Tzimiropoulos 34 0 0 11 Jun 2024
InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech Activity Detection and Speaker Gender Segmentation D. Doukhan Christine Maertens William Le Personnic Ludovic Speroni Reda Dehak 30 2 0 06 Jun 2024
EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos Masashi Hatano Ryo Hachiuma Hideo Saito EgoV 31 3 0 30 May 2024
SkelCap: Automated Generation of Descriptive Text from Skeleton Keypoint Sequences Ali Emre Keskin H. Keles SLR 33 0 0 05 May 2024
TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation Sai Kumar Dwivedi Yu Sun Priyanka Patel Yao Feng Michael J. Black 3DH 42 27 0 25 Apr 2024
Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis Masahiro Yasuda Noboru Harada Yasunori Ohishi Shoichiro Saito Akira Nakayama Nobutaka Ono 34 3 0 12 Apr 2024
Social-MAE: Social Masked Autoencoder for Multi-person Motion Representation Learning Mahsa Ehsanpour Ian Reid Hamid Rezatofighi ViT 34 0 0 08 Apr 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding Ahmad A Mahmood Ashmal Vayani Muzammal Naseer Salman Khan Fahad Shahbaz Khan LRM 49 7 0 21 Mar 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding Long Zhao N. B. Gundavarapu Liangzhe Yuan Hao Zhou Shen Yan ... Huisheng Wang Hartwig Adam Mikhail Sirotenko Ting Liu Boqing Gong VGen 36 29 0 20 Feb 2024
Semi-supervised Active Learning for Video Action Detection Aayush Singh A. J. Rana Akash Kumar Shruti Vyas Y. S. Rawat 30 7 0 12 Dec 2023
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding Trong-Thuan Nguyen Pha Nguyen Khoa Luu 22 12 0 05 Dec 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models Munan Ning Bin Zhu Yujia Xie Bin Lin Jiaxi Cui Lu Yuan Dongdong Chen Li-ming Yuan ELM MLLM 25 58 0 27 Nov 2023
Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video Shashanka Venkataramanan Mamshad Nayeem Rizve João Carreira Yuki M. Asano Yannis Avrithis SSL 29 18 0 12 Oct 2023
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild Haodong Duan Mingze Xu Bing Shuai Davide Modolo Zhuowen Tu Joseph Tighe Alessandro Bergamo ViT 32 1 0 20 Sep 2023
Reconstructing Three-Dimensional Models of Interacting Humans Mihai Fieraru M. Zanfir Elisabeta Oneata A. Popa Vlad Olaru C. Sminchisescu 3DH 21 5 0 03 Aug 2023
A Survey on Deep Learning-based Spatio-temporal Action Detection Peng Wang Fanwei Zeng Yu Qian 26 5 0 03 Aug 2023
ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour Samy Tafasca Anshul Gupta J. Odobez 39 18 0 04 Jul 2023
CVB: A Video Dataset of Cattle Visual Behaviors Ali Zia Renuka Sharma Reza Arablouei G. Bishop-Hurley Jody McNally N. Bagnall V. Rolland Brano Kusy L. Petersson A. Ingham 23 2 0 26 May 2023
Deep Neural Networks in Video Human Action Recognition: A Review Zihan Wang Yang Yang Zhi Liu Y. Zheng 53 4 0 25 May 2023
Type-to-Track: Retrieve Any Object via Prompt-based Tracking Pha Nguyen Kha Gia Quach Kris M. Kitani Khoa Luu 39 18 0 22 May 2023
HICO-DET-SG and V-COCO-SG: New Data Splits for Evaluating the Systematic Generalization Performance of Human-Object Interaction Detection Models Kenta Takemoto Moyuru Yamada Tomotake Sasaki H. Akima 35 0 0 17 May 2023
End-to-End Spatio-Temporal Action Localisation with Video Transformers A. Gritsenko Xuehan Xiong Josip Djolonga Mostafa Dehghani Chen Sun Mario Lucic Cordelia Schmid Anurag Arnab ViT 32 13 0 24 Apr 2023
Efficient Video Action Detection with Token Dropout and Context Refinement Lei Chen Zhan Tong Yibing Song Gangshan Wu Limin Wang 36 14 0 17 Apr 2023
VicTR: Video-conditioned Text Representations for Activity Recognition Kumara Kahatapitiya Anurag Arnab Arsha Nagrani Michael S. Ryoo 31 19 0 05 Apr 2023
Bodily expressed emotion understanding through integrating Laban movement analysis Chenyan Wu Dolzodmaa Davaasuren T. Shafir Rachelle Tsachor James Z. Wang 30 6 0 05 Apr 2023
On the Benefits of 3D Pose and Tracking for Human Action Recognition Jathushan Rajasegaran Georgios Pavlakos Angjoo Kanazawa Christoph Feichtenhofer Jitendra Malik 30 30 0 03 Apr 2023
DOAD: Decoupled One Stage Action Detection Network Shuning Chang Pichao Wang Fan Wang Jiashi Feng Mike Zheng Show 13 4 0 01 Apr 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions Brian Chen Nina Shvetsova Andrew Rouditchenko D. Kondermann Samuel Thomas Shih-Fu Chang Rogerio Feris James R. Glass Hilde Kuehne 29 7 0 29 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models Kunchang Li Yali Wang Yizhuo Li Yi Wang Yinan He Limin Wang Yu Qiao VGen 43 154 0 28 Mar 2023
iBall: Augmenting Basketball Videos with Gaze-moderated Embedded Visualizations Zhutian Chen Qisen Yang Jiarui Shan Tica Lin Johanna Beyer Haijun Xia Hanspeter Pfister 19 28 0 06 Mar 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries Raghav Goyal E. Mavroudi Xitong Yang Sainbayar Sukhbaatar Leonid Sigal Matt Feiszli Lorenzo Torresani Du Tran 12 7 0 16 Feb 2023
YOWOv2: A Stronger yet Efficient Multi-level Detection Framework for Real-time Spatio-temporal Action Detection Jianhua Yang Kun Dai ObjD 18 17 0 14 Feb 2023
CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection C. Nwoye Tong Yu Saurav Sharma Aditya Murali Deepak Alapatt ... Pietro Mascagni B. Seeliger Cristians Gonzalez Didier Mutter N. Padoy 30 17 0 13 Feb 2023
Context Understanding in Computer Vision: A Survey Xuan Wang Zhigang Zhu 16 45 0 10 Feb 2023
Baseline Method for the Sport Task of MediaEval 2022 with 3D CNNs using Attention Mechanisms Pierre-Etienne Martin 14 1 0 06 Feb 2023
Sport Task: Fine Grained Action Detection and Classification of Table Tennis Strokes from Videos for MediaEval 2022 Pierre-Etienne Martin J. Calandre Boris Mansencal J. Benois-Pineau Renaud Péteri L. Mascarilla J. Morlier 21 4 0 31 Jan 2023