Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.00307
Cited By
HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model
1 June 2024
Khoa T. Vo
Thinh Phan
Kashu Yamazaki
Minh-Triet Tran
Ngan Le
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model"
5 / 5 papers shown
Title
GroupViT: Semantic Segmentation Emerges from Text Supervision
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
X. Wang
ViT
VLM
155
380
0
22 Feb 2022
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
190
682
0
13 Oct 2021
Natural Language Video Localization: A Revisit in Span-based Question Answering Framework
Hao Zhang
Aixin Sun
Wei Jing
Liangli Zhen
Joey Tianyi Zhou
Rick Siow Mong Goh
78
73
0
26 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
261
1,486
0
09 Feb 2021
Categorical Reparameterization with Gumbel-Softmax
Eric Jang
S. Gu
Ben Poole
BDL
47
4,781
0
03 Nov 2016
1