Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2308.11062
Cited By
UnLoc: A Unified Framework for Video Localization Tasks
IEEE International Conference on Computer Vision (ICCV), 2023
21 August 2023
Shengjia Yan
Xuehan Xiong
Arsha Nagrani
Anurag Arnab
Zhonghao Wang
Weina Ge
David A. Ross
Cordelia Schmid
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (3544★)
Papers citing
"UnLoc: A Unified Framework for Video Localization Tasks"
50 / 51 papers shown
Title
SMART: Shot-Aware Multimodal Video Moment Retrieval with Audio-Enhanced MLLM
An Yu
Weiheng Lu
Jian Li
Zhenfei Zhang
Yunhang Shen
Felix X.-F. Ye
Ming-Ching Chang
97
0
0
18 Nov 2025
MGCA-Net: Multi-Grained Category-Aware Network for Open-Vocabulary Temporal Action Localization
Zhenying Fang
Richang Hong
108
0
0
17 Nov 2025
Learning Skill-Attributes for Transferable Assessment in Video
Kumar Ashutosh
Kristen Grauman
89
0
0
17 Nov 2025
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick
E. Mavroudi
Yale Song
Rama Chellappa
Lorenzo Torresani
Triantafyllos Afouras
124
0
0
19 Oct 2025
Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey
Jinxuan Li
Chaolei Tan
Haoxuan Chen
Jianxin Ma
Jian-Fang Hu
Wei-Shi Zheng
Jianhuang Lai
VLM
93
1
0
12 Oct 2025
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
Yunheng Li
Jing Cheng
Shaoyong Jia
Hangyi Kuang
Shaohui Jiao
Qibin Hou
Ming-Ming Cheng
AI4TS
VLM
160
3
0
22 Sep 2025
Aligning Moments in Time using Video Queries
Yogesh Kumar
Uday Agarwal
Manish Gupta
Anand Mishra
207
1
0
21 Aug 2025
EgoLoc: A Generalizable Solution for Temporal Interaction Localization in Egocentric Videos
Junyi Ma
Erhang Zhang
Yin-Dong Zheng
Yuchen Xie
Yixuan Zhou
Hesheng Wang
208
0
0
17 Aug 2025
OVG-HQ: Online Video Grounding with Hybrid-modal Queries
Runhao Zeng
Jiaqi Mao
Minghao Lai
Minh Hieu Phan
Yanjie Dong
Wei Wang
Qi Chen
Xiping Hu
104
0
0
16 Aug 2025
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
Siminfar Samakoush Galougah
Rishie Raj
Sanjoy Chowdhury
Sayan Nag
Ramani Duraiswami
124
1
0
10 Aug 2025
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
Zeqian Li
Shangzhe Di
Zhonghua Zhai
Weilin Huang
Yanfeng Wang
Weidi Xie
VLM
106
4
0
23 Jun 2025
Zero-Shot Temporal Interaction Localization for Egocentric Videos
Erhang Zhang
Junyi Ma
Yin-Dong Zheng
Yixuan Zhou
Hesheng Wang
310
2
0
04 Jun 2025
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Jen-Hao Cheng
Vivian Wang
Huayu Wang
Huapeng Zhou
Yi-Hao Peng
...
Wenhao Chai
Yi-Ling Chen
Vibhav Vineet
Qin Cai
Lei Li
AI4TS
684
8
0
02 May 2025
HierSum: A Global and Local Attention Mechanism for Video Summarization
Apoorva Beedu
Irfan Essa
781
0
0
25 Apr 2025
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection
Weijun Zhuang
Qizhang Li
Xin Li
Ming-Yu Liu
Xiaopeng Hong
Feng Gao
Fan Yang
W. Zuo
226
1
0
20 Apr 2025
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
WonJun Moon
Cheol-Ho Cho
Woojin Jun
Minho Shim
Taeoh Kim
Inwoong Lee
Dongyoon Wee
Jae-Pil Heo
212
3
0
17 Apr 2025
Text-Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation
Haoyu Ji
Bowen Chen
Weihong Ren
Wenze Huang
Zhihao Yang
Zhiyong Wang
Honghai Liu
180
0
0
19 Mar 2025
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization
Zongshang Pang
Mayu Otani
Yuta Nakashima
251
3
0
12 Mar 2025
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos
Chen-Da Liu-Zhang
Lin Sui
Shuming Liu
Fangzhou Mu
Ziyi Wang
Bernard Ghanem
223
3
0
09 Mar 2025
Data Augmentation for Instruction Following Policies via Trajectory Segmentation
AAAI Conference on Artificial Intelligence (AAAI), 2025
Niklas Höpner
Ilaria Tiddi
H. V. Hoof
185
0
0
25 Feb 2025
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
Pengcheng Zhao
Zhixian He
Fuwei Zhang
Shujin Lin
Fan Zhou
288
3
0
18 Jan 2025
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Hai Yu
Chong Deng
Qinglin Zhang
Jiaqing Liu
Qian Chen
Wen Wang
350
0
0
31 Dec 2024
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
Dhiman Paul
Md Rizwan Parvez
Nabeel Mohammed
Shafin Rahman
VGen
189
4
0
02 Dec 2024
Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild
Peijun Bao
Chenqi Kong
Zihao Shao
Boon Poh Ng
Meng Hwa Er
Alex C. Kot
210
3
0
01 Dec 2024
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding
Computer Vision and Pattern Recognition (CVPR), 2024
Andong Deng
Zhongpai Gao
Anwesa Choudhuri
Benjamin Planche
Meng Zheng
Bin Wang
Terrence Chen
Chong Chen
Ziyan Wu
AI4TS
292
4
0
25 Nov 2024
Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Wentao Bao
Keqin Li
Yuxiao Chen
Deep Patel
Martin Renqiang Min
Yu Kong
VLM
ObjD
240
7
0
17 Nov 2024
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
International Conference on Learning Representations (ICLR), 2024
Xiangyu Zeng
Kunchang Li
Chenting Wang
Xinhao Li
Tianxiang Jiang
...
Zhengrong Yue
Yi Wang
Yali Wang
Yu Qiao
Limin Wang
MLLM
VLM
AI4TS
235
52
0
25 Oct 2024
Zero-shot Action Localization via the Confidence of Large Vision-Language Models
Josiah Aklilu
Xiaohan Wang
Serena Yeung-Levy
280
1
0
18 Oct 2024
Language-Assisted Human Part Motion Learning for Skeleton-Based Temporal Action Segmentation
Bowen Chen
Haoyu Ji
Zhiyong Wang
Benjamin Filtjens
C. Wang
Weihong Ren
Bart Vanrumste
Honghai Liu
175
1
0
08 Oct 2024
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Neural Information Processing Systems (NeurIPS), 2024
Shiwei Wu
Joya Chen
Kevin Qinghong Lin
Qimeng Wang
Yan Gao
Qianli Xu
Tong Xu
Yao Hu
Enhong Chen
Mike Zheng Shou
VLM
178
26
0
29 Aug 2024
Text-Enhanced Zero-Shot Action Recognition: A training-free approach
International Conference on Pattern Recognition (ICPR), 2024
Massimo Bosetti
Shibingfeng Zhang
Bendetta Liberatori
Giacomo Zara
Elisa Ricci
Paolo Rota
VLM
202
5
0
29 Aug 2024
Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding
Kaijing Ma
Haojian Huang
Jin Chen
Haodong Chen
Pengliang Ji
...
Han Fang
Chao Ban
Hao Sun
Mulin. Chen
Xuelong Li
208
11
0
29 Aug 2024
ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding
Yubin Wang
Xinyang Jiang
De Cheng
Dongsheng Li
Cairong Zhao
VLM
152
2
0
13 Aug 2024
Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization
Jeongseok Hyun
Su Ho Han
Hyolim Kang
Joon-Young Lee
Seon Joo Kim
VLM
222
3
0
09 Jul 2024
Described Spatial-Temporal Video Detection
Wei Ji
Xiangyan Liu
Yingfei Sun
Jiajun Deng
You Qin
Ammar Nuwanna
Mengyao Qiu
Lina Wei
Roger Zimmermann
230
3
0
08 Jul 2024
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding
Zixu Cheng
Yujiang Pu
Shaogang Gong
Parisa Kordjamshidi
Yu Kong
AI4TS
201
2
0
06 Jul 2024
Chrono: A Simple Blueprint for Representing Time in MLLMs
Meinardus Boris
Batra Anil
Rohrbach Anna
Rohrbach Marcus
Marcus Rohrbach
MLLM
VLM
465
4
0
26 Jun 2024
VideoLLM-online: Online Video Large Language Model for Streaming Video
Joya Chen
Zhaoyang Lv
Shiwei Wu
Kevin Qinghong Lin
Chenan Song
Difei Gao
Jia-Wei Liu
Ziteng Gao
Dongxing Mao
Mike Zheng Shou
MLLM
MoMe
275
99
0
17 Jun 2024
Localizing Events in Videos with Multimodal Queries
Computer Vision and Pattern Recognition (CVPR), 2024
Gengyuan Zhang
Mang Ling Ada Fok
Yan Xia
Yansong Tang
Zorah Lähner
Juil Sock
Volker Tresp
Jindong Gu
288
4
0
14 Jun 2024
Context-Enhanced Video Moment Retrieval with Large Language Models
Weijia Liu
Bo Miao
Jiuxin Cao
Xueling Zhu
Bo Liu
Mehwish Nasim
Lin Wang
259
9
0
21 May 2024
Test-Time Zero-Shot Temporal Action Localization
Benedetta Liberatori
Alessandro Conti
Paolo Rota
Yiming Wang
Elisa Ricci
207
9
0
08 Apr 2024
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
Yingsen Zeng
Yujie Zhong
Chengjian Feng
Lin Ma
413
14
0
07 Apr 2024
R
2
R^2
R
2
-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Ye Liu
Jixuan He
Wanhua Li
Junsik Kim
D. Wei
Hanspeter Pfister
Chang Wen Chen
171
31
0
31 Mar 2024
LITA: Language Instructed Temporal-Localization Assistant
De-An Huang
Shijia Liao
Subhashree Radhakrishnan
Hongxu Yin
Pavlo Molchanov
Zhiding Yu
Jan Kautz
VLM
179
95
0
27 Mar 2024
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
231
28
0
26 Mar 2024
TFCounter:Polishing Gems for Training-Free Object Counting
Pan Ting
Jianfeng Lin
Wenhao Yu
Wenlong Zhang
Xiaoying Chen
Jinlu Zhang
Binqiang Huang
134
1
0
12 Mar 2024
Detours for Navigating Instructional Videos
Computer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
374
7
0
03 Jan 2024
Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos
Houlun Chen
Xin Wang
Hong Chen
Zihan Song
Jia Jia
Wenwu Zhu
LRM
196
18
0
28 Dec 2023
Grounded Question-Answering in Long Egocentric Videos
Shangzhe Di
Weidi Xie
433
43
0
11 Dec 2023
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
European Conference on Computer Vision (ECCV), 2023
Pilhyeon Lee
Hyeran Byun
243
26
0
30 Nov 2023
1
2
Next