Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2308.11062
Cited By
UnLoc: A Unified Framework for Video Localization Tasks
IEEE International Conference on Computer Vision (ICCV), 2023
21 August 2023
Shengjia Yan
Xuehan Xiong
Arsha Nagrani
Anurag Arnab
Zhonghao Wang
Weina Ge
David A. Ross
Cordelia Schmid
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (3544★)
Papers citing
"UnLoc: A Unified Framework for Video Localization Tasks"
50 / 51 papers shown
Title
SMART: Shot-Aware Multimodal Video Moment Retrieval with Audio-Enhanced MLLM
An Yu
Weiheng Lu
Jian Li
Zhenfei Zhang
Yunhang Shen
Felix X.-F. Ye
Ming-Ching Chang
113
1
0
18 Nov 2025
MGCA-Net: Multi-Grained Category-Aware Network for Open-Vocabulary Temporal Action Localization
Zhenying Fang
Richang Hong
120
0
0
17 Nov 2025
Learning Skill-Attributes for Transferable Assessment in Video
Kumar Ashutosh
Kristen Grauman
125
0
0
17 Nov 2025
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick
E. Mavroudi
Yale Song
Rama Chellappa
Lorenzo Torresani
Triantafyllos Afouras
128
0
0
19 Oct 2025
Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey
Jinxuan Li
Chaolei Tan
Haoxuan Chen
Jianxin Ma
Jian-Fang Hu
Wei-Shi Zheng
Jianhuang Lai
VLM
97
1
0
12 Oct 2025
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
Yunheng Li
Jing Cheng
Shaoyong Jia
Hangyi Kuang
Shaohui Jiao
Qibin Hou
Ming-Ming Cheng
AI4TS
VLM
180
5
0
22 Sep 2025
Aligning Moments in Time using Video Queries
Yogesh Kumar
Uday Agarwal
Manish Gupta
Anand Mishra
219
1
0
21 Aug 2025
EgoLoc: A Generalizable Solution for Temporal Interaction Localization in Egocentric Videos
Junyi Ma
Erhang Zhang
Yin-Dong Zheng
Yuchen Xie
Yixuan Zhou
Hesheng Wang
220
0
0
17 Aug 2025
OVG-HQ: Online Video Grounding with Hybrid-modal Queries
Runhao Zeng
Jiaqi Mao
Minghao Lai
Minh Hieu Phan
Yanjie Dong
Wei Wang
Qi Chen
Xiping Hu
120
0
0
16 Aug 2025
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
Siminfar Samakoush Galougah
Rishie Raj
Sanjoy Chowdhury
Sayan Nag
Ramani Duraiswami
144
1
0
10 Aug 2025
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
Zeqian Li
Shangzhe Di
Zhonghua Zhai
Weilin Huang
Yanfeng Wang
Weidi Xie
VLM
122
6
0
23 Jun 2025
Zero-Shot Temporal Interaction Localization for Egocentric Videos
Erhang Zhang
Junyi Ma
Yin-Dong Zheng
Yixuan Zhou
Hesheng Wang
342
2
0
04 Jun 2025
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Jen-Hao Cheng
Vivian Wang
Huayu Wang
Huapeng Zhou
Yi-Hao Peng
...
Wenhao Chai
Yi-Ling Chen
Vibhav Vineet
Qin Cai
Lei Li
AI4TS
692
9
0
02 May 2025
HierSum: A Global and Local Attention Mechanism for Video Summarization
Apoorva Beedu
Irfan Essa
789
0
0
25 Apr 2025
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection
Weijun Zhuang
Qizhang Li
Xin Li
Ming-Yu Liu
Xiaopeng Hong
Feng Gao
Fan Yang
W. Zuo
226
1
0
20 Apr 2025
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
WonJun Moon
Cheol-Ho Cho
Woojin Jun
Minho Shim
Taeoh Kim
Inwoong Lee
Dongyoon Wee
Jae-Pil Heo
216
3
0
17 Apr 2025
Text-Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation
Haoyu Ji
Bowen Chen
Weihong Ren
Wenze Huang
Zhihao Yang
Zhiyong Wang
Honghai Liu
196
0
0
19 Mar 2025
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization
Zongshang Pang
Mayu Otani
Yuta Nakashima
259
3
0
12 Mar 2025
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos
Chen-Da Liu-Zhang
Lin Sui
Shuming Liu
Fangzhou Mu
Ziyi Wang
Bernard Ghanem
231
3
0
09 Mar 2025
Data Augmentation for Instruction Following Policies via Trajectory Segmentation
AAAI Conference on Artificial Intelligence (AAAI), 2025
Niklas Höpner
Ilaria Tiddi
H. V. Hoof
185
0
0
25 Feb 2025
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
Pengcheng Zhao
Zhixian He
Fuwei Zhang
Shujin Lin
Fan Zhou
316
3
0
18 Jan 2025
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Hai Yu
Chong Deng
Qinglin Zhang
Jiaqing Liu
Qian Chen
Wen Wang
362
0
0
31 Dec 2024
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
Dhiman Paul
Md Rizwan Parvez
Nabeel Mohammed
Shafin Rahman
VGen
201
4
0
02 Dec 2024
Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild
Peijun Bao
Chenqi Kong
Zihao Shao
Boon Poh Ng
Meng Hwa Er
Alex C. Kot
210
3
0
01 Dec 2024
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding
Computer Vision and Pattern Recognition (CVPR), 2024
Andong Deng
Zhongpai Gao
Anwesa Choudhuri
Benjamin Planche
Meng Zheng
Bin Wang
Terrence Chen
Chong Chen
Ziyan Wu
AI4TS
296
5
0
25 Nov 2024
Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Wentao Bao
Keqin Li
Yuxiao Chen
Deep Patel
Martin Renqiang Min
Yu Kong
VLM
ObjD
248
7
0
17 Nov 2024
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
International Conference on Learning Representations (ICLR), 2024
Xiangyu Zeng
Kunchang Li
Chenting Wang
Xinhao Li
Tianxiang Jiang
...
Zhengrong Yue
Yi Wang
Yali Wang
Yu Qiao
Limin Wang
MLLM
VLM
AI4TS
247
54
0
25 Oct 2024
Zero-shot Action Localization via the Confidence of Large Vision-Language Models
Josiah Aklilu
Xiaohan Wang
Serena Yeung-Levy
284
1
0
18 Oct 2024
Language-Assisted Human Part Motion Learning for Skeleton-Based Temporal Action Segmentation
Bowen Chen
Haoyu Ji
Zhiyong Wang
Benjamin Filtjens
C. Wang
Weihong Ren
Bart Vanrumste
Honghai Liu
185
1
0
08 Oct 2024
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Neural Information Processing Systems (NeurIPS), 2024
Shiwei Wu
Joya Chen
Kevin Qinghong Lin
Qimeng Wang
Yan Gao
Qianli Xu
Tong Xu
Yao Hu
Enhong Chen
Mike Zheng Shou
VLM
190
28
0
29 Aug 2024
Text-Enhanced Zero-Shot Action Recognition: A training-free approach
International Conference on Pattern Recognition (ICPR), 2024
Massimo Bosetti
Shibingfeng Zhang
Bendetta Liberatori
Giacomo Zara
Elisa Ricci
Paolo Rota
VLM
202
5
0
29 Aug 2024
Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding
Kaijing Ma
Haojian Huang
Jin Chen
Haodong Chen
Pengliang Ji
...
Han Fang
Chao Ban
Hao Sun
Mulin. Chen
Xuelong Li
212
11
0
29 Aug 2024
ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding
Yubin Wang
Xinyang Jiang
De Cheng
Dongsheng Li
Cairong Zhao
VLM
160
2
0
13 Aug 2024
Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization
Jeongseok Hyun
Su Ho Han
Hyolim Kang
Joon-Young Lee
Seon Joo Kim
VLM
238
3
0
09 Jul 2024
Described Spatial-Temporal Video Detection
Wei Ji
Xiangyan Liu
Yingfei Sun
Jiajun Deng
You Qin
Ammar Nuwanna
Mengyao Qiu
Lina Wei
Roger Zimmermann
246
3
0
08 Jul 2024
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding
Zixu Cheng
Yujiang Pu
Shaogang Gong
Parisa Kordjamshidi
Yu Kong
AI4TS
205
2
0
06 Jul 2024
Chrono: A Simple Blueprint for Representing Time in MLLMs
Meinardus Boris
Batra Anil
Rohrbach Anna
Rohrbach Marcus
Marcus Rohrbach
MLLM
VLM
465
4
0
26 Jun 2024
VideoLLM-online: Online Video Large Language Model for Streaming Video
Joya Chen
Zhaoyang Lv
Shiwei Wu
Kevin Qinghong Lin
Chenan Song
Difei Gao
Jia-Wei Liu
Ziteng Gao
Dongxing Mao
Mike Zheng Shou
MLLM
MoMe
279
101
0
17 Jun 2024
Localizing Events in Videos with Multimodal Queries
Computer Vision and Pattern Recognition (CVPR), 2024
Gengyuan Zhang
Mang Ling Ada Fok
Yan Xia
Yansong Tang
Zorah Lähner
Juil Sock
Volker Tresp
Jindong Gu
288
4
0
14 Jun 2024
Context-Enhanced Video Moment Retrieval with Large Language Models
Weijia Liu
Bo Miao
Jiuxin Cao
Xueling Zhu
Bo Liu
Mehwish Nasim
Lin Wang
259
9
0
21 May 2024
Test-Time Zero-Shot Temporal Action Localization
Benedetta Liberatori
Alessandro Conti
Paolo Rota
Yiming Wang
Elisa Ricci
251
9
0
08 Apr 2024
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
Yingsen Zeng
Yujie Zhong
Chengjian Feng
Lin Ma
433
14
0
07 Apr 2024
R
2
R^2
R
2
-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Ye Liu
Jixuan He
Wanhua Li
Junsik Kim
D. Wei
Hanspeter Pfister
Chang Wen Chen
171
31
0
31 Mar 2024
LITA: Language Instructed Temporal-Localization Assistant
De-An Huang
Shijia Liao
Subhashree Radhakrishnan
Hongxu Yin
Pavlo Molchanov
Zhiding Yu
Jan Kautz
VLM
179
97
0
27 Mar 2024
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
243
28
0
26 Mar 2024
TFCounter:Polishing Gems for Training-Free Object Counting
Pan Ting
Jianfeng Lin
Wenhao Yu
Wenlong Zhang
Xiaoying Chen
Jinlu Zhang
Binqiang Huang
134
1
0
12 Mar 2024
Detours for Navigating Instructional Videos
Computer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
386
7
0
03 Jan 2024
Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos
Houlun Chen
Xin Wang
Hong Chen
Zihan Song
Jia Jia
Wenwu Zhu
LRM
204
18
0
28 Dec 2023
Grounded Question-Answering in Long Egocentric Videos
Shangzhe Di
Weidi Xie
433
44
0
11 Dec 2023
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
European Conference on Computer Vision (ECCV), 2023
Pilhyeon Lee
Hyeran Byun
243
26
0
30 Nov 2023
1
2
Next