Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1904.02755
Cited By
ExCL: Extractive Clip Localization Using Natural Language Descriptions
4 April 2019
Soham Ghosh
Anuva Agarwal
Zarana Parekh
Alexander G. Hauptmann
CLIP
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ExCL: Extractive Clip Localization Using Natural Language Descriptions"
50 / 88 papers shown
Title
HieraMamba: Video Temporal Grounding via Hierarchical Anchor-Mamba Pooling
Joungbin An
Kristen Grauman
Mamba
233
0
0
27 Oct 2025
Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning
Zhengxuan Wei
Jiajin Tang
Sibei Yang
VLM
132
0
0
22 Oct 2025
When One Moment Isn't Enough: Multi-Moment Retrieval with Cross-Moment Interactions
Zhuo Cao
Heming Du
Bingqing Zhang
Xin Yu
Xue Li
Sen Wang
96
0
0
20 Oct 2025
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick
E. Mavroudi
Yale Song
Rama Chellappa
Lorenzo Torresani
Triantafyllos Afouras
144
0
0
19 Oct 2025
An empirical study of the effect of video encoders on Temporal Video Grounding
Ignacio M. Jara
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
Felipe Bravo-Marquez
104
0
0
19 Oct 2025
Sim-DETR: Unlock DETR for Temporal Sentence Grounding
Jiajin Tang
Zhengxuan Wei
Yuchen Zhu
Cheng Shi
Guanbin Li
Guanbin Li
Sibei Yang
PINN
240
1
0
28 Sep 2025
OVG-HQ: Online Video Grounding with Hybrid-modal Queries
Runhao Zeng
Jiaqi Mao
Minghao Lai
Minh Hieu Phan
Yanjie Dong
Wei Wang
Qi Chen
Xiping Hu
120
0
0
16 Aug 2025
LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization
Zirui Shang
Xinxiao Wu
Shuo Yang
152
0
0
30 May 2025
Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization
Zhuo Tao
Liang Li
Qi Chen
Yunbin Tu
Zheng-Jun Zha
Ming-Hsuan Yang
Yuankai Qi
Qingming Huang
170
0
0
22 Mar 2025
Deep Understanding of Sign Language for Sign to Subtitle Alignment
Youngjoon Jang
Jeongsoo Choi
Junseok Ahn
Joon Son Chung
SLR
283
4
0
05 Mar 2025
TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Joey Tianyi Zhou
Gedas Bertasius
David J. Crandall
404
5
0
12 Dec 2024
Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
ACM Multimedia (MM), 2024
Jongbhin Woo
H. Ryu
Youngjoon Jang
Jae-Won Cho
Joon Son Chung
185
3
0
17 Oct 2024
Temporally Grounding Instructional Diagrams in Unconstrained Videos
Jiahao Zhang
Frederic Z. Zhang
Cristian Rodriguez
Yizhak Ben-Shabat
A. Cherian
Stephen Gould
208
4
0
16 Jul 2024
TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning
Quang Minh Dinh
Minh Khoi Ho
Anh Quan Dang
Hung Phong Tran
200
17
0
14 Apr 2024
SnAG: Scalable and Accurate Video Grounding
Computer Vision and Pattern Recognition (CVPR), 2024
Fangzhou Mu
Sicheng Mo
Yin Li
249
25
0
02 Apr 2024
GPTSee: Enhancing Moment Retrieval and Highlight Detection via Description-Based Similarity Features
Yunzhuo Sun
Yifang Xu
Zien Xie
Yukun Shu
Sidan Du
256
10
0
03 Mar 2024
Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization
Chongzhi Zhang
Mingyuan Zhang
Zhiyang Teng
Jiayi Li
Xizhou Zhu
Lewei Lu
Ziwei Liu
Aixin Sun
DiffM
VGen
134
1
0
16 Jan 2024
Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in Video
AAAI Conference on Artificial Intelligence (AAAI), 2024
Zhaobo Qi
Yibo Yuan
Xiaowen Ruan
Shuhui Wang
Weigang Zhang
Qingming Huang
234
10
0
15 Jan 2024
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection
AAAI Conference on Artificial Intelligence (AAAI), 2024
Hao Sun
Mingyao Zhou
Wenjing Chen
Wei Xie
PINN
3DGS
ViT
206
68
0
04 Jan 2024
DemaFormer: Damped Exponential Moving Average Transformer with Energy-Based Modeling for Temporal Language Grounding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Thong Nguyen
Xiaobao Wu
Xinshuai Dong
Cong-Duy Nguyen
See-Kiong Ng
Anh Tuan Luu
248
9
0
05 Dec 2023
VTimeLLM: Empower LLM to Grasp Video Moments
Computer Vision and Pattern Recognition (CVPR), 2023
Bin Huang
Xin Wang
Hong Chen
Zihan Song
Wenwu Zhu
MLLM
277
231
0
30 Nov 2023
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
European Conference on Computer Vision (ECCV), 2023
Pilhyeon Lee
Hyeran Byun
251
26
0
30 Nov 2023
Temporal Sentence Grounding in Streaming Videos
ACM Multimedia (ACM MM), 2023
Tian Gan
Xiao Wang
Yan Sun
Yue Yu
Qingpei Guo
Liqiang Nie
166
9
0
14 Aug 2023
Knowing Where to Focus: Event-aware Transformer for Video Grounding
IEEE International Conference on Computer Vision (ICCV), 2023
Jinhyun Jang
Jungin Park
Jin-Hwa Kim
Hyeongjun Kwon
Kwanghoon Sohn
179
86
0
14 Aug 2023
UniVTG: Towards Unified Video-Language Temporal Grounding
IEEE International Conference on Computer Vision (ICCV), 2023
Kevin Qinghong Lin
Pengchuan Zhang
Joya Chen
Shraman Pramanick
Difei Gao
Alex Jinpeng Wang
Rui Yan
Mike Zheng Shou
197
186
0
31 Jul 2023
A Survey on Video Moment Localization
ACM Computing Surveys (ACM CSUR), 2022
Meng Liu
Liqiang Nie
Yunxiao Wang
Meng Wang
Yong Rui
308
35
0
13 Jun 2023
MS-DETR: Natural Language Video Localization with Sampling Moment-Moment Interaction
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jiashuo Wang
Aixin Sun
Hao Zhang
Xiaoli Li
ViT
159
17
0
30 May 2023
Generation-Guided Multi-Level Unified Network for Video Grounding
Xingyi Cheng
Xiangyu Wu
Dong Shen
Hezheng Lin
Fan Yang
157
0
0
14 Mar 2023
Towards Diverse Temporal Grounding under Single Positive Labels
Hao Zhou
Chongyang Zhang
Yanjun Chen
Chuanping Hu
152
2
0
12 Mar 2023
Text-Visual Prompting for Efficient 2D Temporal Video Grounding
Computer Vision and Pattern Recognition (CVPR), 2023
Yimeng Zhang
Xin Chen
Jinghan Jia
Sijia Liu
Ke Ding
202
30
0
09 Mar 2023
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training
Computer Vision and Pattern Recognition (CVPR), 2023
Dezhao Luo
Jiabo Huang
S. Gong
Hailin Jin
Yang Liu
VGen
283
41
0
28 Feb 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
183
8
0
16 Feb 2023
MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Wei Ji
Long Chen
Yin-wei Wei
Yiming Wu
Tat-Seng Chua
AI4TS
133
24
0
26 Dec 2022
Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Erica K. Shimomoto
Edison Marrese-Taylor
Hiroya Takamura
Ichiro Kobayashi
Hideki Nakayama
Yusuke Miyao
209
8
0
26 Sep 2022
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Zhijian Hou
Wanjun Zhong
Lei Ji
Difei Gao
Kun Yan
W. Chan
Chong-Wah Ngo
Zheng Shou
Nan Duan
AI4TS
225
33
0
22 Sep 2022
Video-Guided Curriculum Learning for Spoken Video Grounding
ACM Multimedia (ACM MM), 2022
Yan Xia
Zhou Zhao
Shangwei Ye
Yang Zhao
Haoyuan Li
Yi Ren
128
12
0
01 Sep 2022
Can Shuffling Video Benefit Temporal Bias Problem: A Novel Training Framework for Temporal Grounding
European Conference on Computer Vision (ECCV), 2022
Jiachang Hao
Haifeng Sun
Pengfei Ren
Jingyu Wang
Q. Qi
J. Liao
225
33
0
29 Jul 2022
Video Activity Localisation with Uncertainties in Temporal Boundary
European Conference on Computer Vision (ECCV), 2022
Jiabo Huang
Hailin Jin
S. Gong
Yang Liu
232
36
0
26 Jun 2022
You Need to Read Again: Multi-granularity Perception Network for Moment Retrieval in Videos
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2022
Xin Sun
Xinyu Wang
Jialin Gao
Qiong Liu
Xiaoping Zhou
179
43
0
25 May 2022
Entity-aware and Motion-aware Transformers for Language-driven Action Localization in Videos
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Shuo Yang
Xinxiao Wu
201
21
0
12 May 2022
Contrastive Language-Action Pre-training for Temporal Localization
Mengmeng Xu
Erhan Gundogdu
⋆⋆ Maksim
Guohao Li
M. Donoser
Loris Bazzani
162
26
0
26 Apr 2022
Position-aware Location Regression Network for Temporal Video Grounding
Advanced Video and Signal Based Surveillance (AVSS), 2021
Sunoh Kim
Kimin Yun
J. Choi
124
4
0
12 Apr 2022
Learning Commonsense-aware Moment-Text Alignment for Fast Video Temporal Grounding
Ziyue Wu
Junyu Gao
Shucheng Huang
Changsheng Xu
192
6
0
04 Apr 2022
AxIoU: An Axiomatically Justified Measure for Video Moment Retrieval
Computer Vision and Pattern Recognition (CVPR), 2022
Riku Togashi
Mayu Otani
Yuta Nakashima
Esa Rahtu
J. Heikkilä
T. Sakai
117
1
0
30 Mar 2022
Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Bin Li
Yixuan Weng
Bin Sun
Shutao Li
587
64
0
13 Mar 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Amélie Reymond
167
3
0
16 Feb 2022
Explore-And-Match: Bridging Proposal-Based and Proposal-Free With Transformer for Sentence Grounding in Videos
Sangmin Woo
Jinyoung Park
Inyong Koo
Sumin Lee
Minki Jeong
Changick Kim
395
6
0
25 Jan 2022
Temporal Sentence Grounding in Videos: A Survey and Future Directions
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
3DGS
334
49
0
20 Jan 2022
Learning Sample Importance for Cross-Scenario Video Temporal Grounding
International Conference on Multimedia Retrieval (ICMR), 2022
P. Bao
Yadong Mu
118
13
0
08 Jan 2022
LocFormer: Enabling Transformers to Perform Temporal Moment Localization on Long Untrimmed Videos With a Feature Sampling Approach
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
Basura Fernando
Hiroya Takamura
Qi Wu
ViT
169
3
0
19 Dec 2021
1
2
Next