ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.10596
  4. Cited By
Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations
  in Instructional Videos

Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos

20 October 2021
Reuben Tan
Bryan A. Plummer
Kate Saenko
Hailin Jin
Bryan C. Russell
    SSL
ArXivPDFHTML

Papers citing "Look at What I'm Doing: Self-Supervised Spatial Grounding of Narrations in Instructional Videos"

8 / 8 papers shown
Title
Large-scale Pre-training for Grounded Video Caption Generation
Large-scale Pre-training for Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
59
0
0
13 Mar 2025
Learning Fine-grained View-Invariant Representations from Unpaired
  Ego-Exo Videos via Temporal Alignment
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment
Zihui Xue
Kristen Grauman
EgoV
31
30
0
08 Jun 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in
  Untrimmed Multi-Action Videos from Narrated Instructions
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
29
7
0
29 Mar 2023
Referring Multi-Object Tracking
Referring Multi-Object Tracking
Dongming Wu
Wencheng Han
Tiancai Wang
Xingping Dong
Xiangyu Zhang
Jianbing Shen
24
71
0
06 Mar 2023
Collecting The Puzzle Pieces: Disentangled Self-Driven Human Pose
  Transfer by Permuting Textures
Collecting The Puzzle Pieces: Disentangled Self-Driven Human Pose Transfer by Permuting Textures
Nannan Li
Kevin J. Shih
Bryan A. Plummer
29
7
0
04 Oct 2022
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
S. Gorti
Noël Vouitsis
Junwei Ma
Keyvan Golestan
M. Volkovs
Animesh Garg
Guangwei Yu
25
148
0
28 Mar 2022
Improved Baselines with Momentum Contrastive Learning
Improved Baselines with Momentum Contrastive Learning
Xinlei Chen
Haoqi Fan
Ross B. Girshick
Kaiming He
SSL
267
3,369
0
09 Mar 2020
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector Space
Tomáš Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
230
31,253
0
16 Jan 2013
1