Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1804.05113
Cited By
v1
v2
v3 (latest)
Multilevel Language and Vision Integration for Text-to-Clip Retrieval
13 April 2018
Huijuan Xu
Kun He
Bryan A. Plummer
Leonid Sigal
Stan Sclaroff
Kate Saenko
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Multilevel Language and Vision Integration for Text-to-Clip Retrieval"
50 / 160 papers shown
Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict
Chaochen Wu
Guan Luo
Meiyun Zuo
Zhitao Fan
174
0
0
01 Nov 2025
Empower Words: DualGround for Structured Phrase and Sentence-Level Temporal Grounding
Minseok Kang
M. Lee
Minjung Kim
Donghyeong Kim
Sangyoun Lee
154
1
0
23 Oct 2025
Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning
Zhengxuan Wei
Jiajin Tang
Sibei Yang
VLM
198
1
0
22 Oct 2025
When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions
Zhuo Cao
Heming Du
Bingqing Zhang
Xin Yu
Xue Li
Sen Wang
162
1
0
20 Oct 2025
An empirical study of the effect of video encoders on Temporal Video Grounding
Ignacio M. Jara
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
Felipe Bravo-Marquez
171
0
0
19 Oct 2025
Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey
Jinxuan Li
Chaolei Tan
Haoxuan Chen
Jianxin Ma
Jian-Fang Hu
Wei-Shi Zheng
Jianhuang Lai
VLM
250
1
0
12 Oct 2025
Sim-DETR: Unlock DETR for Temporal Sentence Grounding
Jiajin Tang
Zhengxuan Wei
Yuchen Zhu
Cheng Shi
Guanbin Li
Guanbin Li
Sibei Yang
PINN
357
3
0
28 Sep 2025
Video-LLMs with Temporal Visual Screening
Zheyu Fan
Jiateng Liu
Xicheng Zhang
Zihan Wang
Yi R.
Fung
Manling Li
266
2
0
27 Aug 2025
OVG-HQ: Online Video Grounding with Hybrid-modal Queries
Runhao Zeng
Jiaqi Mao
Minghao Lai
Minh Hieu Phan
Yanjie Dong
Wei Wang
Qi Chen
Xiping Hu
187
0
0
16 Aug 2025
Denoise-then-Retrieve: Text-Conditioned Video Denoising for Video Moment Retrieval
International Joint Conference on Artificial Intelligence (IJCAI), 2025
Weijia Liu
Jiuxin Cao
Bo Miao
Zhiheng Fu
Xuelin Zhu
Jiawei Ge
Bo Liu
Mehwish Nasim
Lin Wang
DiffM
VGen
198
0
0
15 Aug 2025
LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization
Zirui Shang
Xinxiao Wu
Shuo Yang
236
0
0
30 May 2025
Object-Shot Enhanced Grounding Network for Egocentric Video
Computer Vision and Pattern Recognition (CVPR), 2025
Yisen Feng
Haoyu Zhang
Meng Liu
Weili Guan
Liqiang Nie
315
8
0
07 May 2025
Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization
Zhuo Tao
Liang Li
Qi Chen
Yunbin Tu
Zheng-Jun Zha
Ming-Hsuan Yang
Yuankai Qi
Qingming Huang
278
0
0
22 Mar 2025
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos
Chen-Da Liu-Zhang
Lin Sui
Shuming Liu
Fangzhou Mu
Ziyi Wang
Bernard Ghanem
392
4
0
09 Mar 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Computer Vision and Pattern Recognition (CVPR), 2022
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
416
34
0
02 Jan 2025
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Zhuo Cao
Bingqing Zhang
Heming Du
Xin Yu
Xue Li
Sen Wang
392
20
0
18 Dec 2024
Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
ACM Multimedia (MM), 2024
Jongbhin Woo
H. Ryu
Youngjoon Jang
Jae-Won Cho
Joon Son Chung
255
5
0
17 Oct 2024
Grounding is All You Need? Dual Temporal Grounding for Video Dialog
You Qin
Wei Ji
Xinze Lan
Hao Fei
Xun Yang
Dan Guo
Roger Zimmermann
Lizi Liao
VGen
351
2
0
08 Oct 2024
ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding
Yubin Wang
Xinyang Jiang
De Cheng
Dongsheng Li
Cairong Zhao
VLM
270
2
0
13 Aug 2024
From Attributes to Natural Language: A Survey and Foresight on Text-based Person Re-identification
Fanzhi Jiang
Su Yang
Mark W. Jones
Liumei Zhang
355
9
0
31 Jul 2024
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
Yiyang Jiang
Wengyu Zhang
Xu-Lu Zhang
Xiaoyong Wei
Chang Wen Chen
Qing Li
424
31
0
21 Jul 2024
Temporally Grounding Instructional Diagrams in Unconstrained Videos
Jiahao Zhang
Frederic Z. Zhang
Cristian Rodriguez
Yizhak Ben-Shabat
A. Cherian
Stephen Gould
403
4
0
16 Jul 2024
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding
Zixu Cheng
Yujiang Pu
Shaogang Gong
Parisa Kordjamshidi
Yu Kong
AI4TS
350
3
0
06 Jul 2024
ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos
Jr-Jen Chen
Yu-Chien Liao
Hsi-Che Lin
Yu-Chu Yu
Yen-Chun Chen
Yu-Chiang Frank Wang
438
50
0
27 Jun 2024
AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding
Xing Zhang
Jiaxi Gu
Haoyu Zhao
Shicong Wang
Hang Xu
Renjing Pei
Songcen Xu
Zuxuan Wu
Yu-Gang Jiang
314
1
0
11 Jun 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
641
35
1
09 Jun 2024
SnAG: Scalable and Accurate Video Grounding
Computer Vision and Pattern Recognition (CVPR), 2024
Fangzhou Mu
Sicheng Mo
Yin Li
415
34
0
02 Apr 2024
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
Computer Vision and Pattern Recognition (CVPR), 2024
Chaolei Tan
Jian-Huang Lai
Wei-Shi Zheng
Jianfang Hu
AI4TS
419
10
0
18 Mar 2024
Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement
Danyang Hou
Liang Pang
Huawei Shen
Xueqi Cheng
377
9
0
21 Feb 2024
Event-aware Video Corpus Moment Retrieval
Danyang Hou
Liang Pang
Huawei Shen
Xueqi Cheng
344
4
0
21 Feb 2024
Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization
Chongzhi Zhang
Mingyuan Zhang
Zhiyang Teng
Jiayi Li
Xizhou Zhu
Lewei Lu
Ziwei Liu
Aixin Sun
DiffM
VGen
199
1
0
16 Jan 2024
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection
AAAI Conference on Artificial Intelligence (AAAI), 2024
Hao Sun
Mingyao Zhou
Wenjing Chen
Wei Xie
PINN
3DGS
ViT
319
79
0
04 Jan 2024
LLM4VG: Large Language Models Evaluation for Video Grounding
Wei Feng
Xin Wang
Hong Chen
Zeyang Zhang
Zihan Song
Yuwei Zhou
Wenwu Zhu
437
11
0
21 Dec 2023
Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding
Haifeng Huang
Yang Zhao
Zehan Wang
Yan Xia
Zhou Zhao
302
1
0
21 Dec 2023
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
European Conference on Computer Vision (ECCV), 2023
Pilhyeon Lee
Hyeran Byun
378
32
0
30 Nov 2023
Query by Activity Video in the Wild
International Conference on Information Photonics (ICIP), 2023
Tao Hu
William Thong
Pascal Mettes
Cees G. M. Snoek
308
0
0
23 Nov 2023
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
WonJun Moon
Sangeek Hyun
Subeen Lee
Jae-Pil Heo
484
20
0
15 Nov 2023
Learning Temporal Sentence Grounding From Narrated EgoVideos
British Machine Vision Conference (BMVC), 2023
Kevin Flanagan
Dima Damen
Michael Wray
243
3
0
26 Oct 2023
Exploring Iterative Refinement with Diffusion Models for Video Grounding
IEEE International Conference on Multimedia and Expo (ICME), 2023
Xiao Liang
Tao Shi
Yaoyuan Liang
Te Tao
Shao-Lun Huang
DiffM
331
2
0
26 Oct 2023
NEUCORE: Neural Concept Reasoning for Composed Image Retrieval
Shu Zhao
Huijuan Xu
192
9
0
02 Oct 2023
Dual-Path Temporal Map Optimization for Make-up Temporal Video Grounding
Multimedia Systems (MS), 2023
Jiaxiu Li
Kun Li
Jia Li
Guoliang Chen
Dan Guo
Meng Wang
287
3
0
12 Sep 2023
Zero-Shot Video Moment Retrieval from Frozen Vision-Language Models
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Dezhao Luo
Jiabo Huang
Shaogang Gong
Hailin Jin
Yang Liu
VLM
352
21
0
01 Sep 2023
DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Henghao Zhao
Kevin Qinghong Lin
Rui Yan
Zechao Li
VGen
DiffM
432
11
0
29 Aug 2023
Temporal Sentence Grounding in Streaming Videos
ACM Multimedia (ACM MM), 2023
Tian Gan
Xiao Wang
Yan Sun
Yue Yu
Qingpei Guo
Liqiang Nie
307
10
0
14 Aug 2023
Knowing Where to Focus: Event-aware Transformer for Video Grounding
IEEE International Conference on Computer Vision (ICCV), 2023
Jinhyun Jang
Jungin Park
Jin-Hwa Kim
Hyeongjun Kwon
Kwanghoon Sohn
362
99
0
14 Aug 2023
ViGT: Proposal-free Video Grounding with Learnable Token in Transformer
Science China Information Sciences (Sci China Inf Sci), 2023
Kun Li
Dan Guo
Meng Wang
ViT
177
68
0
11 Aug 2023
D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation
IEEE International Conference on Computer Vision (ICCV), 2023
Hanjun Li
Xiujun Shu
Su He
Ruizhi Qiao
Wei Wen
Taian Guo
Bei Gan
Xing Sun
250
20
0
08 Aug 2023
G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory
IEEE International Conference on Computer Vision (ICCV), 2023
Hongxiang Li
Meng Cao
Xuxin Cheng
Yaowei Li
Zhihong Zhu
Yuexian Zou
433
32
0
26 Jul 2023
MomentDiff: Generative Video Moment Retrieval from Random to Real
Neural Information Processing Systems (NeurIPS), 2023
P. Li
Chen-Wei Xie
Hongtao Xie
Liming Zhao
Lei Zhang
Yun Zheng
Deli Zhao
Yongdong Zhang
DiffM
VGen
393
95
0
06 Jul 2023
A Survey on Video Moment Localization
ACM Computing Surveys (ACM CSUR), 2022
Meng Liu
Liqiang Nie
Yunxiao Wang
Meng Wang
Yong Rui
398
42
0
13 Jun 2023
1
2
3
4
Next
Page 1 of 4