Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video

25 January 2020

Papers citing "Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video"

29 / 29 papers shown

ResidualViT for Efficient Temporally Dense Video Encoding

224

16 Sep 2025

Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining

492

10 May 2025

Cross-modal Causal Relation Alignment for Video Question GroundingComputer Vision and Pattern Recognition (CVPR), 2025

343

05 Mar 2025

TimeRefine: Temporal Grounding with Time Refining Video LLM

606

12 Dec 2024

Commonsense for Zero-Shot Natural Language Video LocalizationAAAI Conference on Artificial Intelligence (AAAI), 2023

Meghana Holla

Ismini Lourentzou

401

29 Dec 2023

Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos

278

28 Dec 2023

Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding

Zhou Zhao

303

21 Dec 2023

EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language ModelIEEE transactions on multimedia (IEEE TMM), 2023

Guozhang Li

Xinpeng Ding

De Cheng

Jie Li

Nannan Wang

Xinbo Gao

506

05 Dec 2023

Learning Temporal Sentence Grounding From Narrated EgoVideosBritish Machine Vision Conference (BMVC), 2023

Kevin Flanagan

Dima Damen

Michael Wray

244

26 Oct 2023

SCANet: Scene Complexity Aware Network for Weakly-Supervised Video Moment RetrievalIEEE International Conference on Computer Vision (ICCV), 2023

427

08 Oct 2023

Counterfactual Cross-modality Reasoning for Weakly Supervised Video Moment LocalizationACM Multimedia (ACM MM), 2023

Zezhong Lv

Fuchun Sun

Ji-Rong Wen

319

10 Aug 2023

D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance AnnotationIEEE International Conference on Computer Vision (ICCV), 2023

Xing Sun

251

08 Aug 2023

Constraint and Union for Partially-Supervised Temporal Sentence Grounding

227

20 Feb 2023

Hypotheses Tree Building for One-Shot Temporal Sentence LocalizationAAAI Conference on Artificial Intelligence (AAAI), 2023

Weining Lu

282

05 Jan 2023

Language-free Training for Zero-shot Video GroundingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

251

24 Oct 2022

Weakly-Supervised Temporal Article GroundingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Heng Ji

235

22 Oct 2022

Masked Motion Encoding for Self-Supervised Video Representation LearningComputer Vision and Pattern Recognition (CVPR), 2022

Chuang Gan

402

12 Oct 2022

Dilated Context Integrated Network with Cross-Modal Consensus for Temporal Emotion Localization in VideosACM Multimedia (ACM MM), 2022

...

288

03 Aug 2022

Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled Videos

Arnav Chakravarthy

Zhiyuan Fang

Yezhou Yang

185

28 Apr 2022

Multi-Scale Self-Contrastive Learning with Hard Negative Mining for Weakly-Supervised Query-based Video Grounding

Shentong Mo

Daizong Liu

Wei Hu

SSL

171

08 Mar 2022

Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos

506

25 Jan 2022

Temporal Sentence Grounding in Videos: A Survey and Future DirectionsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

478

20 Jan 2022

A Survey on Temporal Sentence Grounding in Videos

401

16 Sep 2021

Zero-shot Natural Language Video LocalizationIEEE International Conference on Computer Vision (ICCV), 2021

431

29 Aug 2021

COOT: Cooperative Hierarchical Transformer for Video-Text Representation LearningNeural Information Processing Systems (NeurIPS), 2020

Simon Ging

Mohammadreza Zolfaghari

Hamed Pirsiavash

Thomas Brox

ViT CLIP

279

178

01 Nov 2020

Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos

Zhu Zhang

Zhijie Lin

Zhou Zhao

Jieming Zhu

Xiuqiang He

187

19 Aug 2020

Weak Supervision and Referring Attention for Temporal-Textual Association Learning

Zhiyuan Fang

Shu Kong

Zhe Wang

Charless C. Fowlkes

Yezhou Yang

169

21 Jun 2020

Weakly-Supervised Multi-Level Attentional Reconstruction Network for Grounding Textual Queries in Videos

219

16 Mar 2020

Cops-Ref: A new Dataset and Task on Compositional Referring Expression ComprehensionComputer Vision and Pattern Recognition (CVPR), 2020

Peng Wang

Qi Wu

299

01 Mar 2020