Title
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos Arnav Chakravarthy Zhiyuan Fang Yezhou Yang 21 2 0 28 Apr 2022
Collaborative Learning for Hand and Object Reconstruction with Attention-guided Graph Convolution Tze Ho Elden Tse K. Kim A. Leonardis H. Chang 3DH 19 57 0 27 Apr 2022
Model-agnostic Multi-Domain Learning with Domain-Specific Adapters for Action Recognition Kazuki Omi Jun Kimata Toru Tamaki 21 7 0 15 Apr 2022
Joint Hand Motion and Interaction Hotspots Prediction from Egocentric Videos Shao-Wei Liu Subarna Tripathi Somdeb Majumdar Xiaolong Wang EgoV 20 93 0 04 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language Andy Zeng Maria Attarian Brian Ichter K. Choromanski Adrian S. Wong ... Michael S. Ryoo Vikas Sindhwani Johnny Lee Vincent Vanhoucke Peter R. Florence ReLM LRM 13 569 0 01 Apr 2022
Understanding 3D Object Articulation in Internet Videos Shengyi Qian Linyi Jin C. Rockwell Siyi Chen David Fouhey 25 28 0 30 Mar 2022
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities Fadime Sener Dibyadip Chatterjee Daniel Shelepov Kun He Dipika Singhania Robert Y. Wang Angela Yao VGen 19 203 0 28 Mar 2022
Discovering Human-Object Interaction Concepts via Self-Compositional Learning Zhi Hou Baosheng Yu Dacheng Tao 17 18 0 27 Mar 2022
How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs Hazel Doughty Cees G. M. Snoek 20 19 0 23 Mar 2022
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos Tomávs Souvcek Jean-Baptiste Alayrac Antoine Miech Ivan Laptev Josef Sivic 19 32 0 22 Mar 2022
Learnable Irrelevant Modality Dropout for Multimodal Action Recognition on Modality-Specific Annotated Videos Saghir Alfasly Jian Lu C. Xu Yuru Zou 34 18 0 06 Mar 2022
HOI4D: A 4D Egocentric Dataset for Category-Level Human-Object Interaction Yunze Liu Yun-Hai Liu Chen Jiang Kangbo Lyu Weikang Wan Hao Shen Bo-Hua Liang Zhoujie Fu He-Nan Wang Li Yi 45 172 0 03 Mar 2022
Should I take a walk? Estimating Energy Expenditure from Video Data Kunyu Peng Alina Roitberg Kailun Yang Jiaming Zhang Rainer Stiefelhagen 11 4 0 01 Feb 2022
InstaIndoor and Multi-modal Deep Learning for Indoor Scene Recognition A. Glavan Estefanía Talavera 13 10 0 23 Dec 2021
EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices Siwei Zhang Qianli Ma Yan Zhang Zhiyin Qian Taein Kwon Marc Pollefeys Federica Bogo Siyu Tang 24 92 0 14 Dec 2021
Self-Regulated Learning for Egocentric Video Activity Anticipation Zhaobo Qi Shuhui Wang Chi Su Li Su Qingming Huang Q. Tian EgoV 37 52 0 23 Nov 2021
Shaping embodied agent behavior with activity-context priors from egocentric video Tushar Nagarajan Kristen Grauman EgoV LM&Ro 38 13 0 14 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 224 1,018 0 13 Oct 2021
How much human-like visual experience do current self-supervised learning algorithms need in order to achieve human-level object recognition? Emin Orhan OOD 30 4 0 23 Sep 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention Katsuyuki Nakamura Hiroki Ohashi Mitsuhiro Okada EgoV 31 12 0 07 Sep 2021
SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos Nada Osman Guglielmo Camporese Pasquale Coscia Lamberto Ballan EgoV 31 20 0 02 Sep 2021
Is First Person Vision Challenging for Object Tracking? Matteo Dunnhofer Antonino Furnari G. Farinella C. Micheloni 17 23 0 31 Aug 2021
PALMAR: Towards Adaptive Multi-inhabitant Activity Recognition in Point-Cloud Technology Mohammad Arif Ul Alam M. Rahman Jared Q. Widberg 11 21 0 22 Jun 2021
A Survey on Human-aware Robot Navigation Ronja Möller Antonino Furnari S. Battiato Aki Härmä G. Farinella 31 87 0 22 Jun 2021
Space-time Mixing Attention for Video Transformer Adrian Bulat Juan-Manuel Perez-Rua Swathikiran Sudhakaran Brais Martínez Georgios Tzimiropoulos ViT 25 124 0 10 Jun 2021
Modeling long-term interactions to enhance action recognition Alejandro Cartas P. Radeva Mariella Dimiccoli EgoV 14 6 0 23 Apr 2021
Spatiotemporal Deformable Scene Graphs for Complex Activity Detection Salman Khan Fabio Cuzzolin 3DPC 35 5 0 16 Apr 2021
Unidentified Video Objects: A Benchmark for Dense, Open-World Segmentation Weiyao Wang Matt Feiszli Heng Wang Du Tran VOS 12 123 0 10 Apr 2021
Learning to Recognize Actions on Objects in Egocentric Video with Attention Dictionaries Swathikiran Sudhakaran Sergio Escalera O. Lanz EgoV 25 15 0 16 Feb 2021
Temporal-Relational CrossTransformers for Few-Shot Action Recognition Toby Perrett A. Masullo T. Burghardt Majid Mirmehdi Dima Damen ViT 9 145 0 15 Jan 2021
Learning to Anticipate Egocentric Actions by Imagination Yu Wu Linchao Zhu Xiaohan Wang Yi Yang Fei Wu EgoV 77 69 0 13 Jan 2021
Learning Temporal Dynamics from Cycles in Narrated Video Dave Epstein Jiajun Wu Cordelia Schmid Chen Sun AI4TS 28 14 0 07 Jan 2021
Semantics for Robotic Mapping, Perception and Interaction: A Survey Sourav Garg Niko Sünderhauf Feras Dayoub D. Morrison Akansel Cosgun ... Tat-Jun Chin Ian Reid Stephen Gould Peter Corke Michael Milford 11 115 0 02 Jan 2021
Automated acquisition of structured, semantic models of manipulation activities from human VR demonstration Andrei Haidu Michael Beetz 9 10 0 27 Nov 2020
4D Human Body Capture from Egocentric Video via 3D Scene Grounding Miao Liu Dexin Yang Yan Zhang Zhaopeng Cui James M. Rehg Siyu Tang 8 38 0 26 Nov 2020
Whose hand is this? Person Identification from Egocentric Hand Gestures Satoshi Tsutsui Yanwei Fu David J. Crandall EgoV 6 7 0 17 Nov 2020
ActBERT: Learning Global-Local Video-Text Representations Linchao Zhu Yi Yang ViT 29 417 0 14 Nov 2020
Robust and efficient post-processing for video object detection Alberto Sabater Luis Montesano Ana C. Murillo 4 49 0 23 Sep 2020
HAA500: Human-Centric Atomic Action Dataset with Curated Videos Jihoon Chung Cheng-hsin Wuu Hsuan-ru Yang Yu-Wing Tai Chi-Keung Tang 13 43 0 11 Sep 2020
Performance of object recognition in wearable videos Alberto Sabater Luis Montesano Ana C. Murillo 14 2 0 10 Sep 2020
TinyVIRAT: Low-resolution Video Action Recognition Ugur Demir Y. S. Rawat M. Shah 20 35 0 14 Jul 2020
Learning Interactions and Relationships between Movie Characters Anna Kukleva Makarand Tapaswi Ivan Laptev 36 51 0 29 Mar 2020
PIC: Permutation Invariant Convolution for Recognizing Long-range Activities Noureldien Hussein E. Gavves A. Smeulders VLM 13 13 0 18 Mar 2020
Hand-Priming in Object Localization for Assistive Egocentric Vision Kyungjun Lee Abhinav Shrivastava Hernisa Kacorri EgoV 11 16 0 28 Feb 2020
Audiovisual SlowFast Networks for Video Recognition Fanyi Xiao Yong Jae Lee Kristen Grauman Jitendra Malik Christoph Feichtenhofer 192 205 0 23 Jan 2020
Analysis of the hands in egocentric vision: A survey A. Bandini José Zariffa EgoV 14 71 0 23 Dec 2019
Action Genome: Actions as Composition of Spatio-temporal Scene Graphs Jingwei Ji Ranjay Krishna Li Fei-Fei Juan Carlos Niebles 39 335 0 15 Dec 2019
Action Modifiers: Learning from Adverbs in Instructional Videos Hazel Doughty Ivan Laptev W. Mayol-Cuevas Dima Damen 10 30 0 13 Dec 2019
Self-Supervised Learning by Cross-Modal Audio-Video Clustering Humam Alwassel D. Mahajan Bruno Korbar Lorenzo Torresani Bernard Ghanem Du Tran SSL 14 428 0 28 Nov 2019
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition Evangelos Kazakos Arsha Nagrani Andrew Zisserman Dima Damen EgoV 14 0 0 22 Aug 2019