Title
NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory Santhosh Kumar Ramakrishnan Ziad Al-Halah Kristen Grauman 58 30 0 02 Jan 2023
Real-time Online Video Detection with Temporal Smoothing Transformers Yue Zhao Philipp Krahenbuhl ViT 58 35 0 19 Sep 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Junnan Li Dongxu Li Caiming Xiong S. Hoi MLLM BDL VLM CLIP 372 2,713 0 28 Jan 2022
ASFormer: Transformer for Action Segmentation Fangqiu Yi Hongyu Wen Tingting Jiang ViT 50 114 0 16 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video Kristen Grauman Andrew Westbury Eugene Byrne Zachary Chavis Antonino Furnari ... Mike Zheng Shou Antonio Torralba Lorenzo Torresani Mingfei Yan Jitendra Malik EgoV 198 682 0 13 Oct 2021
The Power of Scale for Parameter-Efficient Prompt Tuning Brian Lester Rami Al-Rfou Noah Constant VPVLM 267 2,999 0 18 Apr 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions Wenhai Wang Enze Xie Xiang Li Deng-Ping Fan Kaitao Song Ding Liang Tong Lu Ping Luo Ling Shao ViT 254 2,898 0 24 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision Chao Jia Yinfei Yang Ye Xia Yi-Ting Chen Zarana Parekh Hieu H. Pham Quoc V. Le Yun-hsuan Sung Zhen Li Tom Duerig VLM CLIP 285 2,875 0 11 Feb 2021
Is Space-Time Attention All You Need for Video Understanding? Gedas Bertasius Heng Wang Lorenzo Torresani ViT 267 1,486 0 09 Feb 2021
BSN: Boundary Sensitive Network for Temporal Action Proposal Generation Tianwei Lin Xu Zhao Haisheng Su Chongjing Wang Ming Yang 122 646 0 08 Jun 2018
Aggregated Residual Transformations for Deep Neural Networks Saining Xie Ross B. Girshick Piotr Dollár Z. Tu Kaiming He 252 6,278 0 16 Nov 2016