Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.02101
Cited By
TALL: Temporal Activity Localization via Language Query
5 May 2017
J. Gao
Chen Sun
Zhenheng Yang
Ram Nevatia
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TALL: Temporal Activity Localization via Language Query"
50 / 420 papers shown
Title
Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining
Lu Dong
H. Zhang
Hongjie Zhang
Y. Huang
Z. Ling
Yu Qiao
Limin Wang
Y. Wang
AI4TS
26
0
0
10 May 2025
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
Haibo Wang
Bo Feng
Zhengfeng Lai
Mingze Xu
Shiyu Li
Weifeng Ge
Afshin Dehghan
Meng Cao
Ping-Chia Huang
OffRL
49
0
0
08 May 2025
Object-Shot Enhanced Grounding Network for Egocentric Video
Yisen Feng
Haoyu Zhang
Meng Liu
Weili Guan
Liqiang Nie
36
0
0
07 May 2025
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Jen-Hao Cheng
Vivian Wang
Huayu Wang
Huapeng Zhou
Yi-Hao Peng
...
Wenhao Chai
Yi-Ling Chen
Vibhav Vineet
Qin Cai
Jenq-Neng Hwang
AI4TS
123
0
0
02 May 2025
Exploiting Inter-Sample Correlation and Intra-Sample Redundancy for Partially Relevant Video Retrieval
Junlong Ren
Gangjian Zhang
Y. Hu
Jian Shu
H. Wang
29
0
0
28 Apr 2025
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
84
0
0
28 Apr 2025
Vidi: Large Multimodal Models for Video Understanding and Editing
Vidi Team
Celong Liu
Chia-Wen Kuo
Dawei Du
Fan Chen
...
Wen Zhong
Xiaohui Shen
Xin Gu
Xing Mei
Xueqiong Qu
67
0
0
22 Apr 2025
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection
Weijun Zhuang
Qizhang Li
Xin Li
Ming-Yu Liu
Xiaopeng Hong
Feng Gao
Fan Yang
W. Zuo
30
0
0
20 Apr 2025
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
WonJun Moon
Cheol-Ho Cho
Woojin Jun
Minho Shim
Taeoh Kim
Inwoong Lee
Dongyoon Wee
Jae-Pil Heo
29
0
0
17 Apr 2025
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking
H. Tran
Tinh-Anh Nguyen-Nhu
Huu-Phong Phan-Nguyen
T. Nguyen
Nhat-Minh Nguyen-Dich
Anh Dao
Huy-Duc Do
Quan Nguyen
Hoang M. Le
Quang-Vinh Dinh
26
0
0
11 Apr 2025
VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding
Henghao Zhao
Ge-Peng Ji
Rui Yan
Huan Xiong
Zechao Li
24
0
0
10 Apr 2025
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
Xinhao Li
Ziang Yan
Desen Meng
Lu Dong
Xiangyu Zeng
Yinan He
Y. Wang
Yu Qiao
Yi Wang
Limin Wang
VLM
AI4TS
LRM
38
2
0
09 Apr 2025
LVC: A Lightweight Compression Framework for Enhancing VLMs in Long Video Understanding
Ziyi Wang
Haoran Wu
Yiming Rong
Deyang Jiang
Yixin Zhang
Y. Zhao
Shuang Xu
Bo Xu
VLM
44
0
0
09 Apr 2025
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation
Hao Du
Bo Wu
Yan Lu
Zhendong Mao
27
0
0
08 Apr 2025
Moment Quantization for Video Temporal Grounding
Xiaolong Sun
Le Wang
Sanping Zhou
Liushuai Shi
Kun Xia
Mengnan Liu
Yabing Wang
Gang Hua
MQ
29
0
0
03 Apr 2025
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
Junwen Pan
Rui Zhang
Xin Wan
Yuan Zhang
Ming Lu
Qi She
VLM
44
1
0
02 Apr 2025
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models
Dohwan Ko
S. Kim
Yumin Suh
Vijay Kumar B.G
Minseo Yoon
Manmohan Chandraker
Hyunwoo J. Kim
LRM
41
0
0
25 Mar 2025
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
Xiangrui Liu
Yan Shu
Zheng Liu
Ao Li
Yang Tian
Bo Zhao
VGen
VLM
97
0
0
24 Mar 2025
Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization
Zhuo Tao
Liang Li
Qi Chen
Yunbin Tu
Zheng-Jun Zha
Ming-Hsuan Yang
Yuankai Qi
Qingming Huang
45
0
0
22 Mar 2025
Temporal Action Detection Model Compression by Progressive Block Drop
Xiaoyong Chen
Yong Guo
Jiaming Liang
Sitong Zhuang
Runhao Zeng
Xiping Hu
50
0
0
21 Mar 2025
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Y. Liu
Kevin Qinghong Lin
C. Chen
Mike Zheng Shou
LM&Ro
LRM
79
0
0
17 Mar 2025
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
Zixu Cheng
Jian Hu
Ziquan Liu
Chenyang Si
Wei Li
Shaogang Gong
LRM
70
2
0
14 Mar 2025
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
Jiali Yao
Xinran Deng
Xin Gu
Mengrui Dai
Bing Fan
Zhipeng Zhang
Yan Huang
Heng Fan
L. Zhang
56
0
0
13 Mar 2025
Generative Frame Sampler for Long Video Understanding
Linli Yao
Haoning Wu
Kun Ouyang
Y. Zhang
Caiming Xiong
Bei Chen
Xu Sun
Junnan Li
VLM
VGen
52
0
0
12 Mar 2025
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization
Zongshang Pang
Mayu Otani
Yuta Nakashima
56
0
0
12 Mar 2025
Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question Evaluation
Qiji Zhou
Yifan Gong
Guangsheng Bao
Hongjie Qiu
Jinqiang Li
Xiangrong Zhu
Huajian Zhang
Yue Zhang
LRM
44
0
0
12 Mar 2025
Deep Understanding of Sign Language for Sign to Subtitle Alignment
Youngjoon Jang
Jeongsoo Choi
Junseok Ahn
Joon Son Chung
SLR
76
0
0
05 Mar 2025
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
Pengcheng Zhao
Zhixian He
Fuwei Zhang
Shujin Lin
Fan Zhou
38
1
0
18 Jan 2025
Multi-modal Fusion and Query Refinement Network for Video Moment Retrieval and Highlight Detection
Yifang Xu
Yunzhuo Sun
Benxiang Zhai
Zien Xie
Youyao Jia
S. Du
42
2
0
18 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
81
2
0
10 Jan 2025
Length-Aware DETR for Robust Moment Retrieval
S. Park
Jiho Choi
Kyungjune Baek
Hyunjung Shim
34
0
0
31 Dec 2024
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Xinhao Li
Yi Wang
Jiashuo Yu
Xiangyu Zeng
Yuhan Zhu
...
Yinan He
Chenting Wang
Yu Qiao
Yali Wang
L. Wang
VLM
75
25
0
31 Dec 2024
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding
Zhuo Cao
Bingqing Zhang
Heming Du
Xin Yu
Xue Li
Sen Wang
67
1
0
18 Dec 2024
NowYouSee Me: Context-Aware Automatic Audio Description
Seon-Ho Lee
Jue Wang
D. Fan
Zhikang Zhang
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
83
0
0
13 Dec 2024
TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Mohit Bansal
Gedas Bertasius
David J. Crandall
109
1
0
12 Dec 2024
Streaming Detection of Queried Event Start
Cristobal Eyzaguirre
Eric Tang
S. Buch
Adrien Gaidon
Jiajun Wu
Juan Carlos Niebles
74
0
0
04 Dec 2024
Video LLMs for Temporal Reasoning in Long Videos
Fawad Javed Fateh
Umer Ahmed
Hamza Khan
M. Zia
Quoc-Huy Tran
VLM
83
0
0
04 Dec 2024
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval
Dhiman Paul
Md Rizwan Parvez
Nabeel Mohammed
Shafin Rahman
VGen
67
0
0
02 Dec 2024
Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild
Peijun Bao
Chenqi Kong
Zihao Shao
Boon Poh Ng
Meng Hwa Er
Alex C. Kot
54
2
0
01 Dec 2024
TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability
Shimin Chen
Xiaohan Lan
Yitian Yuan
Zequn Jie
Lin Ma
VLM
MLLM
73
13
0
27 Nov 2024
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format
Yueqian Wang
Xiaojun Meng
Y. Wang
Jianxin Liang
Jiansheng Wei
Huishuai Zhang
Dongyan Zhao
VGen
83
8
0
27 Nov 2024
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding
Andong Deng
Zhongpai Gao
Anwesa Choudhuri
Benjamin Planche
Meng Zheng
Bin Wang
Terrence Chen
C. L. P. Chen
Ziyan Wu
AI4TS
83
1
0
25 Nov 2024
ReWind: Understanding Long Videos with Instructed Learnable Memory
Anxhelo Diko
Tinghuai Wang
Wassim Swaileh
Shiyan Sun
Ioannis Patras
KELM
VLM
77
0
0
23 Nov 2024
On the Consistency of Video Large Language Models in Temporal Comprehension
Minjoon Jung
Junbin Xiao
Byoung-Tak Zhang
Angela Yao
83
2
0
20 Nov 2024
Grounded Video Caption Generation
Evangelos Kazakos
Cordelia Schmid
Josef Sivic
30
0
0
12 Nov 2024
VARS: Vision-based Assessment of Risk in Security Systems
Pranav Gupta
Pratham Gohil
Sridhar S
19
0
0
25 Oct 2024
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Xiangyu Zeng
Kunchang Li
Chenting Wang
Xinhao Li
Tianxiang Jiang
...
Zhengrong Yue
Yi Wang
Yali Wang
Yu Qiao
Limin Wang
MLLM
VLM
AI4TS
69
14
0
25 Oct 2024
Let Me Finish My Sentence: Video Temporal Grounding with Holistic Text Understanding
Jongbhin Woo
H. Ryu
Youngjoon Jang
Jae-Won Cho
Joon Son Chung
26
1
0
17 Oct 2024
OMCAT: Omni Context Aware Transformer
Arushi Goel
Karan Sapra
Matthieu Le
Rafael Valle
Andrew Tao
Bryan Catanzaro
MLLM
VLM
16
0
0
15 Oct 2024
When Does Perceptual Alignment Benefit Vision Representations?
Shobhita Sundaram
Stephanie Fu
Lukas Muttenthaler
Netanel Y. Tamir
Lucy Chai
Simon Kornblith
Trevor Darrell
Phillip Isola
49
6
1
14 Oct 2024
1
2
3
4
5
6
7
8
9
Next