AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video
Understanding

AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding

19 June 2024

Alessandro Suglia

Ioannis Papaioannou

Ioannis Konstas

Papers citing "AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding"

7 / 7 papers shown

Title
Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input Jian Wang Rishabh Dabral D. Luvizon Zhe Cao Lingjie Liu Thabo Beeler Christian Theobalt EgoV 45 0 0 11 Apr 2025
ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition Sanjoy Kundu Shanmukha Vellamchetti Sathyanarayanan N. Aakur EgoV 50 0 0 04 Apr 2025
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction Yuejiao Su Yi Wang Qiongyang Hu Chuang Yang Lap-Pui Chau 45 0 0 02 Apr 2025
UrbanVideo-Bench: Benchmarking Vision-Language Models on Embodied Intelligence with Video Data in Urban Spaces Baining Zhao Jianjie Fang Zichao Dai Z. Wang Jirong Zha ... Chen Gao Y. Wang Jinqiang Cui Xinlei Chen Y. Li 51 2 0 08 Mar 2025
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics Chan Hee Song Valts Blukis Jonathan Tremblay Stephen Tyree Yu-Chuan Su Stan Birchfield 83 5 0 25 Nov 2024
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 224 1,017 0 13 Oct 2021
Visually Grounded Reasoning across Languages and Cultures Fangyu Liu Emanuele Bugliarello E. Ponti Siva Reddy Nigel Collier Desmond Elliott VLM LRM 92 167 0 28 Sep 2021