HENASY: Learning to Assemble Scene-Entities for Egocentric
Video-Language Model

HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model

1 June 2024

Minh-Triet Tran

Ngan Le

Papers citing "HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model"

5 / 5 papers shown

Title
GroupViT: Semantic Segmentation Emerges from Text Supervision Jiarui Xu Shalini De Mello Sifei Liu Wonmin Byeon Thomas Breuel Jan Kautz X. Wang ViT VLM 153 380 0 22 Feb 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 188 682 0 13 Oct 2021
Natural Language Video Localization: A Revisit in Span-based Question Answering Framework Hao Zhang Aixin Sun Wei Jing Liangli Zhen Joey Tianyi Zhou Rick Siow Mong Goh 76 73 0 26 Feb 2021
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 255 1,486 0 09 Feb 2021
Categorical Reparameterization with Gumbel-Softmax Eric Jang S. Gu Ben Poole BDL 45 4,781 0 03 Nov 2016