ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.11062
  4. Cited By
UnLoc: A Unified Framework for Video Localization Tasks

UnLoc: A Unified Framework for Video Localization Tasks

IEEE International Conference on Computer Vision (ICCV), 2023
21 August 2023
Shengjia Yan
Xuehan Xiong
Arsha Nagrani
Anurag Arnab
Zhonghao Wang
Weina Ge
David A. Ross
Cordelia Schmid
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (3544★)

Papers citing "UnLoc: A Unified Framework for Video Localization Tasks"

50 / 51 papers shown
Title
SMART: Shot-Aware Multimodal Video Moment Retrieval with Audio-Enhanced MLLM
SMART: Shot-Aware Multimodal Video Moment Retrieval with Audio-Enhanced MLLM
An Yu
Weiheng Lu
Jian Li
Zhenfei Zhang
Yunhang Shen
Felix X.-F. Ye
Ming-Ching Chang
97
0
0
18 Nov 2025
MGCA-Net: Multi-Grained Category-Aware Network for Open-Vocabulary Temporal Action Localization
MGCA-Net: Multi-Grained Category-Aware Network for Open-Vocabulary Temporal Action Localization
Zhenying Fang
Richang Hong
108
0
0
17 Nov 2025
Learning Skill-Attributes for Transferable Assessment in Video
Learning Skill-Attributes for Transferable Assessment in Video
Kumar Ashutosh
Kristen Grauman
89
0
0
17 Nov 2025
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Enrich and Detect: Video Temporal Grounding with Multimodal LLMs
Shraman Pramanick
E. Mavroudi
Yale Song
Rama Chellappa
Lorenzo Torresani
Triantafyllos Afouras
124
0
0
19 Oct 2025
Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey
Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey
Jinxuan Li
Chaolei Tan
Haoxuan Chen
Jianxin Ma
Jian-Fang Hu
Wei-Shi Zheng
Jianhuang Lai
VLM
93
1
0
12 Oct 2025
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
Yunheng Li
Jing Cheng
Shaoyong Jia
Hangyi Kuang
Shaohui Jiao
Qibin Hou
Ming-Ming Cheng
AI4TSVLM
160
3
0
22 Sep 2025
Aligning Moments in Time using Video Queries
Aligning Moments in Time using Video Queries
Yogesh Kumar
Uday Agarwal
Manish Gupta
Anand Mishra
207
1
0
21 Aug 2025
EgoLoc: A Generalizable Solution for Temporal Interaction Localization in Egocentric Videos
EgoLoc: A Generalizable Solution for Temporal Interaction Localization in Egocentric Videos
Junyi Ma
Erhang Zhang
Yin-Dong Zheng
Yuchen Xie
Yixuan Zhou
Hesheng Wang
208
0
0
17 Aug 2025
OVG-HQ: Online Video Grounding with Hybrid-modal Queries
OVG-HQ: Online Video Grounding with Hybrid-modal Queries
Runhao Zeng
Jiaqi Mao
Minghao Lai
Minh Hieu Phan
Yanjie Dong
Wei Wang
Qi Chen
Xiping Hu
104
0
0
16 Aug 2025
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
Siminfar Samakoush Galougah
Rishie Raj
Sanjoy Chowdhury
Sayan Nag
Ramani Duraiswami
124
1
0
10 Aug 2025
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
Zeqian Li
Shangzhe Di
Zhonghua Zhai
Weilin Huang
Yanfeng Wang
Weidi Xie
VLM
106
4
0
23 Jun 2025
Zero-Shot Temporal Interaction Localization for Egocentric Videos
Zero-Shot Temporal Interaction Localization for Egocentric Videos
Erhang Zhang
Junyi Ma
Yin-Dong Zheng
Yixuan Zhou
Hesheng Wang
310
2
0
04 Jun 2025
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Jen-Hao Cheng
Vivian Wang
Huayu Wang
Huapeng Zhou
Yi-Hao Peng
...
Wenhao Chai
Yi-Ling Chen
Vibhav Vineet
Qin Cai
Lei Li
AI4TS
684
8
0
02 May 2025
HierSum: A Global and Local Attention Mechanism for Video Summarization
HierSum: A Global and Local Attention Mechanism for Video Summarization
Apoorva Beedu
Irfan Essa
781
0
0
25 Apr 2025
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection
Weijun Zhuang
Qizhang Li
Xin Li
Ming-Yu Liu
Xiaopeng Hong
Feng Gao
Fan Yang
W. Zuo
226
1
0
20 Apr 2025
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
WonJun Moon
Cheol-Ho Cho
Woojin Jun
Minho Shim
Taeoh Kim
Inwoong Lee
Dongyoon Wee
Jae-Pil Heo
212
3
0
17 Apr 2025
Text-Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation
Text-Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation
Haoyu Ji
Bowen Chen
Weihong Ren
Wenze Huang
Zhihao Yang
Zhiyong Wang
Honghai Liu
180
0
0
19 Mar 2025
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization
Zongshang Pang
Mayu Otani
Yuta Nakashima
251
3
0
12 Mar 2025
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long Videos
Chen-Da Liu-Zhang
Lin Sui
Shuming Liu
Fangzhou Mu
Ziyi Wang
Bernard Ghanem
223
3
0
09 Mar 2025
Data Augmentation for Instruction Following Policies via Trajectory SegmentationAAAI Conference on Artificial Intelligence (AAAI), 2025
Niklas Höpner
Ilaria Tiddi
H. V. Hoof
185
0
0
25 Feb 2025
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
Pengcheng Zhao
Zhixian He
Fuwei Zhang
Shujin Lin
Fan Zhou
288
3
0
18 Jan 2025
Multimodal Fusion and Coherence Modeling for Video Topic Segmentation
Multimodal Fusion and Coherence Modeling for Video Topic SegmentationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Hai Yu
Chong Deng
Qinglin Zhang
Jiaqing Liu
Qian Chen
Wen Wang
350
0
0
31 Dec 2024
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
Dhiman Paul
Md Rizwan Parvez
Nabeel Mohammed
Shafin Rahman
VGen
189
4
0
02 Dec 2024
Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in
  the Wild
Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild
Peijun Bao
Chenqi Kong
Zihao Shao
Boon Poh Ng
Meng Hwa Er
Alex C. Kot
210
3
0
01 Dec 2024
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal GroundingComputer Vision and Pattern Recognition (CVPR), 2024
Andong Deng
Zhongpai Gao
Anwesa Choudhuri
Benjamin Planche
Meng Zheng
Bin Wang
Terrence Chen
Chong Chen
Ziyan Wu
AI4TS
292
4
0
25 Nov 2024
Exploiting VLM Localizability and Semantics for Open Vocabulary Action DetectionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Wentao Bao
Keqin Li
Yuxiao Chen
Deep Patel
Martin Renqiang Min
Yu Kong
VLMObjD
240
7
0
17 Nov 2024
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded TuningInternational Conference on Learning Representations (ICLR), 2024
Xiangyu Zeng
Kunchang Li
Chenting Wang
Xinhao Li
Tianxiang Jiang
...
Zhengrong Yue
Yi Wang
Yali Wang
Yu Qiao
Limin Wang
MLLMVLMAI4TS
235
52
0
25 Oct 2024
Zero-shot Action Localization via the Confidence of Large Vision-Language Models
Zero-shot Action Localization via the Confidence of Large Vision-Language Models
Josiah Aklilu
Xiaohan Wang
Serena Yeung-Levy
280
1
0
18 Oct 2024
Language-Assisted Human Part Motion Learning for Skeleton-Based Temporal
  Action Segmentation
Language-Assisted Human Part Motion Learning for Skeleton-Based Temporal Action Segmentation
Bowen Chen
Haoyu Ji
Zhiyong Wang
Benjamin Filtjens
C. Wang
Weihong Ren
Bart Vanrumste
Honghai Liu
175
1
0
08 Oct 2024
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths
  Vision Computation
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision ComputationNeural Information Processing Systems (NeurIPS), 2024
Shiwei Wu
Joya Chen
Kevin Qinghong Lin
Qimeng Wang
Yan Gao
Qianli Xu
Tong Xu
Yao Hu
Enhong Chen
Mike Zheng Shou
VLM
178
26
0
29 Aug 2024
Text-Enhanced Zero-Shot Action Recognition: A training-free approach
Text-Enhanced Zero-Shot Action Recognition: A training-free approachInternational Conference on Pattern Recognition (ICPR), 2024
Massimo Bosetti
Shibingfeng Zhang
Bendetta Liberatori
Giacomo Zara
Elisa Ricci
Paolo Rota
VLM
202
5
0
29 Aug 2024
Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal
  Grounding
Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding
Kaijing Ma
Haojian Huang
Jin Chen
Haodong Chen
Pengliang Ji
...
Han Fang
Chao Ban
Hao Sun
Mulin. Chen
Xuelong Li
208
11
0
29 Aug 2024
ActPrompt: In-Domain Feature Adaptation via Action Cues for Video
  Temporal Grounding
ActPrompt: In-Domain Feature Adaptation via Action Cues for Video Temporal Grounding
Yubin Wang
Xinyang Jiang
De Cheng
Dongsheng Li
Cairong Zhao
VLM
152
2
0
13 Aug 2024
Exploring Scalability of Self-Training for Open-Vocabulary Temporal
  Action Localization
Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization
Jeongseok Hyun
Su Ho Han
Hyolim Kang
Joon-Young Lee
Seon Joo Kim
VLM
222
3
0
09 Jul 2024
Described Spatial-Temporal Video Detection
Described Spatial-Temporal Video Detection
Wei Ji
Xiangyan Liu
Yingfei Sun
Jiajun Deng
You Qin
Ammar Nuwanna
Mengyao Qiu
Lina Wei
Roger Zimmermann
230
3
0
08 Jul 2024
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional
  Temporal Grounding
SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding
Zixu Cheng
Yujiang Pu
Shaogang Gong
Parisa Kordjamshidi
Yu Kong
AI4TS
201
2
0
06 Jul 2024
Chrono: A Simple Blueprint for Representing Time in MLLMs
Chrono: A Simple Blueprint for Representing Time in MLLMs
Meinardus Boris
Batra Anil
Rohrbach Anna
Rohrbach Marcus
Marcus Rohrbach
MLLMVLM
465
4
0
26 Jun 2024
VideoLLM-online: Online Video Large Language Model for Streaming Video
VideoLLM-online: Online Video Large Language Model for Streaming Video
Joya Chen
Zhaoyang Lv
Shiwei Wu
Kevin Qinghong Lin
Chenan Song
Difei Gao
Jia-Wei Liu
Ziteng Gao
Dongxing Mao
Mike Zheng Shou
MLLMMoMe
275
99
0
17 Jun 2024
Localizing Events in Videos with Multimodal Queries
Localizing Events in Videos with Multimodal QueriesComputer Vision and Pattern Recognition (CVPR), 2024
Gengyuan Zhang
Mang Ling Ada Fok
Yan Xia
Yansong Tang
Zorah Lähner
Juil Sock
Volker Tresp
Jindong Gu
288
4
0
14 Jun 2024
Context-Enhanced Video Moment Retrieval with Large Language Models
Context-Enhanced Video Moment Retrieval with Large Language Models
Weijia Liu
Bo Miao
Jiuxin Cao
Xueling Zhu
Bo Liu
Mehwish Nasim
Lin Wang
259
9
0
21 May 2024
Test-Time Zero-Shot Temporal Action Localization
Test-Time Zero-Shot Temporal Action Localization
Benedetta Liberatori
Alessandro Conti
Paolo Rota
Yiming Wang
Elisa Ricci
207
9
0
08 Apr 2024
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
Yingsen Zeng
Yujie Zhong
Chengjian Feng
Lin Ma
413
14
0
07 Apr 2024
$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video
  Temporal Grounding
R2R^2R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Ye Liu
Jixuan He
Wanhua Li
Junsik Kim
D. Wei
Hanspeter Pfister
Chang Wen Chen
171
31
0
31 Mar 2024
LITA: Language Instructed Temporal-Localization Assistant
LITA: Language Instructed Temporal-Localization Assistant
De-An Huang
Shijia Liao
Subhashree Radhakrishnan
Hongxu Yin
Pavlo Molchanov
Zhiding Yu
Jan Kautz
VLM
179
95
0
27 Mar 2024
OmniVid: A Generative Framework for Universal Video Understanding
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLMVGen
231
28
0
26 Mar 2024
TFCounter:Polishing Gems for Training-Free Object Counting
TFCounter:Polishing Gems for Training-Free Object Counting
Pan Ting
Jianfeng Lin
Wenhao Yu
Wenlong Zhang
Xiaoying Chen
Jinlu Zhang
Binqiang Huang
134
1
0
12 Mar 2024
Detours for Navigating Instructional Videos
Detours for Navigating Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
374
7
0
03 Jan 2024
Grounding-Prompter: Prompting LLM with Multimodal Information for
  Temporal Sentence Grounding in Long Videos
Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos
Houlun Chen
Xin Wang
Hong Chen
Zihan Song
Jia Jia
Wenwu Zhu
LRM
196
18
0
28 Dec 2023
Grounded Question-Answering in Long Egocentric Videos
Grounded Question-Answering in Long Egocentric Videos
Shangzhe Di
Weidi Xie
433
43
0
11 Dec 2023
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal
  Sentence Grounding in Videos
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in VideosEuropean Conference on Computer Vision (ECCV), 2023
Pilhyeon Lee
Hyeran Byun
243
26
0
30 Nov 2023
12
Next