Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.10596
Cited By
Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos
20 October 2021
Reuben Tan
Bryan A. Plummer
Kate Saenko
Hailin Jin
Bryan C. Russell
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos"
8 / 8 papers shown
Title
Large-scale Pre-training for Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
59
0
0
13 Mar 2025
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment
Zihui Xue
Kristen Grauman
EgoV
31
30
0
08 Jun 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
32
7
0
29 Mar 2023
Referring Multi-Object Tracking
Dongming Wu
Wencheng Han
Tiancai Wang
Xingping Dong
Xiangyu Zhang
Jianbing Shen
26
71
0
06 Mar 2023
Collecting The Puzzle Pieces: Disentangled Self-Driven Human Pose Transfer by Permuting Textures
Nannan Li
Kevin J. Shih
Bryan A. Plummer
29
7
0
04 Oct 2022
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
S. Gorti
Noël Vouitsis
Junwei Ma
Keyvan Golestan
M. Volkovs
Animesh Garg
Guangwei Yu
25
148
0
28 Mar 2022
Improved Baselines with Momentum Contrastive Learning
Xinlei Chen
Haoqi Fan
Ross B. Girshick
Kaiming He
SSL
267
3,369
0
09 Mar 2020
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
233
31,253
0
16 Jan 2013
1