v1v2 (latest)

Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos

Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2019

6 June 2019

Zhou Zhao

Papers citing "Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos"

50 / 87 papers shown

When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions

166

20 Oct 2025

Enrich and Detect: Video Temporal Grounding with Multimodal LLMs

Triantafyllos Afouras

277

19 Oct 2025

OVG-HQ: Online Video Grounding with Hybrid-modal Queries

190

16 Aug 2025

MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval

348

30 Dec 2024

FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal GroundingIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

397

18 Dec 2024

Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild

362

01 Dec 2024

Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval

434

21 Jul 2024

Context-Enhanced Video Moment Retrieval with Large Language Models

Bo Liu

327

21 May 2024

Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization

Ziwei Liu

202

16 Jan 2024

Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in VideoAAAI Conference on Artificial Intelligence (AAAI), 2024

366

15 Jan 2024

Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding

Zhou Zhao

303

21 Dec 2023

BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in VideosEuropean Conference on Computer Vision (ECCV), 2023

Pilhyeon Lee

Hyeran Byun

381

30 Nov 2023

Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding

494

15 Nov 2023

Exploring Iterative Refinement with Diffusion Models for Video GroundingIEEE International Conference on Multimedia and Expo (ICME), 2023

334

26 Oct 2023

Knowing Where to Focus: Event-aware Transformer for Video GroundingIEEE International Conference on Computer Vision (ICCV), 2023

368

102

14 Aug 2023

ViGT: Proposal-free Video Grounding with Learnable Token in TransformerScience China Information Sciences (Sci China Inf Sci), 2023

Kun Li

Dan Guo

Meng Wang

ViT

178

11 Aug 2023

Encode-Store-Retrieve: Enhancing Memory Augmentation through Language-Encoded Egocentric PerceptionInternational Symposium on Mixed and Augmented Reality (ISMAR), 2023

Junxiao Shen

John J. Dudley

Per Ola Kristensson

RALM

145

10 Aug 2023

G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game TheoryIEEE International Conference on Computer Vision (ICCV), 2023

444

26 Jul 2023

A Survey on Video Moment LocalizationACM Computing Surveys (ACM CSUR), 2022

Meng Wang

406

13 Jun 2023

Transform-Equivariant Consistency Learning for Temporal Sentence Grounding

Weining Lu

327

06 May 2023

Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long VideosIEEE International Conference on Computer Vision (ICCV), 2023

Yuxin Peng

255

15 Mar 2023

You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed VideosComputer Vision and Pattern Recognition (CVPR), 2023

269

14 Mar 2023

Generation-Guided Multi-Level Unified Network for Video Grounding

Fan Yang

286

14 Mar 2023

Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

Ran Cheng

Ping Luo

VLM

318

11 Mar 2023

Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal Sentence Localization in VideosIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Daizong Liu

Pan Zhou

VOS

349

02 Mar 2023

Tracking Objects and Activities with Attention for Temporal Sentence GroundingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Zeyu Xiong

Daizong Liu

Pan Zhou

Jiahao Zhu

345

21 Feb 2023

Constraint and Union for Partially-Supervised Temporal Sentence Grounding

227

20 Feb 2023

Hypotheses Tree Building for One-Shot Temporal Sentence LocalizationAAAI Conference on Artificial Intelligence (AAAI), 2023

Weining Lu

282

05 Jan 2023

Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence GroundingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Jiahao Zhu

...

Lichao Sun

241

02 Jan 2023

MRTNet: Multi-Resolution Temporal Network for Video Sentence GroundingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Wei Ji

204

26 Dec 2022

FedVMR: A New Federated Learning method for Video Moment RetrievalIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

240

28 Oct 2022

Fine-grained Semantic Alignment Network for Weakly Supervised Temporal Language GroundingConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

218

21 Oct 2022

Multi-Modal Cross-Domain Alignment Network for Video Moment RetrievalIEEE transactions on multimedia (IEEE TMM), 2022

495

23 Sep 2022

Video-Guided Curriculum Learning for Spoken Video GroundingACM Multimedia (ACM MM), 2022

Zhou Zhao

200

01 Sep 2022

Hierarchical Local-Global Transformer for Temporal Sentence GroundingIEEE transactions on multimedia (IEEE TMM), 2022

355

31 Aug 2022

Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal GroundingEuropean Conference on Computer Vision (ECCV), 2022

336

29 Jul 2022

Reducing the Vision and Language Bias for Temporal Sentence GroundingACM Multimedia (ACM MM), 2022

Daizong Liu

Xiaoye Qu

Wei Hu

293

27 Jul 2022

Skimming, Locating, then Perusing: A Human-Like Framework for Natural Language Video LocalizationACM Multimedia (ACM MM), 2022

Daizong Liu

Wei Hu

250

27 Jul 2022

You Need to Read Again: Multi-granularity Perception Network for Moment Retrieval in VideosAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022

250

25 May 2022

Video Moment Retrieval from Text Queries via Single Frame AnnotationAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022

327

20 Apr 2022

Position-aware Location Regression Network for Temporal Video GroundingAdvanced Video and Signal Based Surveillance (AVSS), 2021

Sunoh Kim

Kimin Yun

J. Choi

250

12 Apr 2022

Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding

325

04 Apr 2022

TubeDETR: Spatio-Temporal Video Grounding with TransformersComputer Vision and Pattern Recognition (CVPR), 2022

389

127

30 Mar 2022

End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video GroundingAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Zhou Zhao

...

Peng Wang

355

15 Mar 2022

A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach

216

10 Mar 2022

Multi-Scale Self-Contrastive Learning with Hard Negative Mining for Weakly-Supervised Query-based Video Grounding

Shentong Mo

Daizong Liu

Wei Hu

SSL

171

08 Mar 2022

Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence GroundingIEEE transactions on multimedia (IEEE TMM), 2022

248

06 Mar 2022

Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos

506

25 Jan 2022

Temporal Sentence Grounding in Videos: A Survey and Future DirectionsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

478

20 Jan 2022

Learning Sample Importance for Cross-Scenario Video Temporal GroundingInternational Conference on Multimedia Retrieval (ICMR), 2022

P. Bao

Yadong Mu

190

08 Jan 2022