v1v2v3 (latest)

Multilevel Language and Vision Integration for Text-to-Clip Retrieval

13 April 2018

Papers citing "Multilevel Language and Vision Integration for Text-to-Clip Retrieval"

50 / 160 papers shown

Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict

174

01 Nov 2025

Empower Words: DualGround for Structured Phrase and Sentence-Level Temporal Grounding

154

23 Oct 2025

Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning

198

22 Oct 2025

When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions

162

20 Oct 2025

An empirical study of the effect of video encoders on Temporal Video Grounding

Ignacio M. Jara

Cristian Rodriguez-Opazo

Edison Marrese-Taylor

Felipe Bravo-Marquez

171

19 Oct 2025

Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey

250

12 Oct 2025

Sim-DETR: Unlock DETR for Temporal Sentence Grounding

357

28 Sep 2025

Video-LLMs with Temporal Visual Screening

266

27 Aug 2025

OVG-HQ: Online Video Grounding with Hybrid-modal Queries

187

16 Aug 2025

Denoise-then-Retrieve: Text-Conditioned Video Denoising for Video Moment RetrievalInternational Joint Conference on Artificial Intelligence (IJCAI), 2025

198

15 Aug 2025

LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization

Zirui Shang

Xinxiao Wu

Shuo Yang

236

30 May 2025

Object-Shot Enhanced Grounding Network for Egocentric VideoComputer Vision and Pattern Recognition (CVPR), 2025

315

07 May 2025

Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization

278

22 Mar 2025

TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos

392

09 Mar 2025

Reading to Listen at the Cocktail Party: Multi-Modal Speech SeparationComputer Vision and Pattern Recognition (CVPR), 2022

Akam Rahimi

Triantafyllos Afouras

Andrew Zisserman

416

02 Jan 2025

FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal GroundingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

392

18 Dec 2024

Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text UnderstandingACM Multimedia (MM), 2024

255

17 Oct 2024

Grounding is All You Need? Dual Temporal Grounding for Video Dialog

Xun Yang

Dan Guo

Roger Zimmermann

Lizi Liao

VGen

351

08 Oct 2024

ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding

Yubin Wang

270

13 Aug 2024

From Attributes to Natural Language: A Survey and Foresight on Text-based Person Re-identification

355

31 Jul 2024

Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval

424

21 Jul 2024

Temporally Grounding Instructional Diagrams in Unconstrained Videos

Yizhak Ben-Shabat

403

16 Jul 2024

SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding

350

06 Jul 2024

ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos

Yu-Chiang Frank Wang

438

27 Jun 2024

AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding

Yu-Gang Jiang

314

11 Jun 2024

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data PerspectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

641

09 Jun 2024

SnAG: Scalable and Accurate Video GroundingComputer Vision and Pattern Recognition (CVPR), 2024

Fangzhou Mu

Sicheng Mo

Yin Li

415

02 Apr 2024

Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph GroundingComputer Vision and Pattern Recognition (CVPR), 2024

419

18 Mar 2024

Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement

377

21 Feb 2024

Event-aware Video Corpus Moment Retrieval

344

21 Feb 2024

Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization

Ziwei Liu

199

16 Jan 2024

TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight DetectionAAAI Conference on Artificial Intelligence (AAAI), 2024

319

04 Jan 2024

LLM4VG: Large Language Models Evaluation for Video Grounding

437

21 Dec 2023

Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding

Zhou Zhao

302

21 Dec 2023

BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in VideosEuropean Conference on Computer Vision (ECCV), 2023

Pilhyeon Lee

Hyeran Byun

378

30 Nov 2023

Query by Activity Video in the WildInternational Conference on Information Photonics (ICIP), 2023

308

23 Nov 2023

Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding

484

15 Nov 2023

Learning Temporal Sentence Grounding From Narrated EgoVideosBritish Machine Vision Conference (BMVC), 2023

Kevin Flanagan

Dima Damen

Michael Wray

243

26 Oct 2023

Exploring Iterative Refinement with Diffusion Models for Video GroundingIEEE International Conference on Multimedia and Expo (ICME), 2023

331

26 Oct 2023

NEUCORE: Neural Concept Reasoning for Composed Image Retrieval

Shu Zhao

Huijuan Xu

192

02 Oct 2023

Dual-Path Temporal Map Optimization for Make-up Temporal Video GroundingMultimedia Systems (MS), 2023

Jia Li

Meng Wang

287

12 Sep 2023

Zero-Shot Video Moment Retrieval from Frozen Vision-Language ModelsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

Yang Liu

352

01 Sep 2023

DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight DetectionIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023

Henghao Zhao

Kevin Qinghong Lin

Rui Yan

Zechao Li

VGen DiffM

432

29 Aug 2023

Temporal Sentence Grounding in Streaming VideosACM Multimedia (ACM MM), 2023

307

14 Aug 2023

Knowing Where to Focus: Event-aware Transformer for Video GroundingIEEE International Conference on Computer Vision (ICCV), 2023

362

14 Aug 2023

ViGT: Proposal-free Video Grounding with Learnable Token in TransformerScience China Information Sciences (Sci China Inf Sci), 2023

Kun Li

Dan Guo

Meng Wang

ViT

177

11 Aug 2023

D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance AnnotationIEEE International Conference on Computer Vision (ICCV), 2023

Xing Sun

250

08 Aug 2023

G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game TheoryIEEE International Conference on Computer Vision (ICCV), 2023

433

26 Jul 2023

MomentDiff: Generative Video Moment Retrieval from Random to RealNeural Information Processing Systems (NeurIPS), 2023

393

06 Jul 2023

A Survey on Video Moment LocalizationACM Computing Surveys (ACM CSUR), 2022

Meng Wang

398

13 Jun 2023