Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.02101
Cited By
TALL: Temporal Activity Localization via Language Query
5 May 2017
J. Gao
Chen Sun
Zhenheng Yang
Ram Nevatia
Re-assign community
ArXiv
PDF
HTML
Papers citing
"TALL: Temporal Activity Localization via Language Query"
50 / 420 papers shown
Title
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding
Jingjing Hu
Dan Guo
Kun Li
Zhan Si
Xun Yang
Xiaojun Chang
Meng Wang
59
3
0
21 Mar 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
Ahmad A Mahmood
Ashmal Vayani
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
LRM
49
7
0
21 Mar 2024
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
Chaolei Tan
Jian-Huang Lai
Wei-Shi Zheng
Jianfang Hu
AI4TS
38
5
0
18 Mar 2024
HawkEye: Training Video-Text LLMs for Grounding Text in Videos
Yueqian Wang
Xiaojun Meng
Jianxin Liang
Yuxuan Wang
Qun Liu
Dongyan Zhao
32
30
0
15 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
64
73
0
14 Mar 2024
TutoAI: A Cross-domain Framework for AI-assisted Mixed-media Tutorial Creation on Physical Tasks
Yuexi Chen
Vlad I. Morariu
Anh Truong
Zhicheng Liu
DiffM
VGen
31
3
0
12 Mar 2024
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
35
180
0
11 Mar 2024
VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT
Yifang Xu
Yunzhuo Sun
Zien Xie
Benxiang Zhai
Sidan Du
43
6
0
04 Mar 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
29
29
0
20 Feb 2024
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Long Qian
Juncheng Billy Li
Yu-hao Wu
Yaobo Ye
Hao Fei
Tat-Seng Chua
Yueting Zhuang
Siliang Tang
MLLM
LRM
60
47
0
18 Feb 2024
Video Editing for Video Retrieval
Bin Zhu
Kevin Flanagan
A. Fragomeni
Michael Wray
Dima Damen
CLIP
29
0
0
04 Feb 2024
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering
Haibo Wang
Chenghang Lai
Yixuan Sun
Weifeng Ge
17
5
0
19 Jan 2024
Temporal Insight Enhancement: Mitigating Temporal Hallucination in Multimodal Large Language Models
Li Sun
Liuan Wang
Jun Sun
Takayuki Okatani
MLLM
19
0
0
18 Jan 2024
Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization
Chongzhi Zhang
Mingyuan Zhang
Zhiyang Teng
Jiayi Li
Xizhou Zhu
Lewei Lu
Ziwei Liu
Aixin Sun
DiffM
VGen
18
0
0
16 Jan 2024
GroundingGPT:Language Enhanced Multi-modal Grounding Model
Zhaowei Li
Qi Xu
Dong Zhang
Hang Song
Yiqing Cai
...
Junting Pan
Zefeng Li
Van Tu Vu
Zhida Huang
Tao Wang
28
37
0
11 Jan 2024
Towards Weakly Supervised Text-to-Audio Grounding
Xuenan Xu
Ziyang Ma
Mengyue Wu
Kai Yu
AI4TS
28
9
0
05 Jan 2024
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection
Hao Sun
Mingyao Zhou
Wenjing Chen
Wei Xie
PINN
3DGS
ViT
19
32
0
04 Jan 2024
Video-GroundingDINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Ming-Hsuan Yang
Fahad Shahbaz Khan
18
12
0
31 Dec 2023
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
50
82
0
29 Dec 2023
Commonsense for Zero-Shot Natural Language Video Localization
Meghana Holla
Ismini Lourentzou
27
3
0
29 Dec 2023
Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos
Houlun Chen
Xin Wang
Hong Chen
Zihan Song
Jia Jia
Wenwu Zhu
LRM
31
10
0
28 Dec 2023
Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Supervised Temporal Video Grounding
Sunoh Kim
Jungchan Cho
Joonsang Yu
Youngjoon Yoo
Jin Young Choi
16
9
0
27 Dec 2023
A Strong Baseline for Temporal Video-Text Alignment
Zeqian Li
Qirui Chen
Tengda Han
Ya-Qin Zhang
Yanfeng Wang
Weidi Xie
AI4TS
VGen
24
5
0
21 Dec 2023
LLM4VG: Large Language Models Evaluation for Video Grounding
Wei Feng
Xin Wang
Hong Chen
Zeyang Zhang
Zihan Song
Yuwei Zhou
Wenwu Zhu
39
8
0
21 Dec 2023
Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding
Haifeng Huang
Yang Zhao
Zehan Wang
Yan Xia
Zhou Zhao
31
1
0
21 Dec 2023
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
Zhihang Liu
Jun Li
Hongtao Xie
Pandeng Li
Jiannan Ge
Sun-Ao Liu
Guoqing Jin
35
18
0
19 Dec 2023
Grounded Question-Answering in Long Egocentric Videos
Shangzhe Di
Weidi Xie
37
23
0
11 Dec 2023
DemaFormer: Damped Exponential Moving Average Transformer with Energy-Based Modeling for Temporal Language Grounding
Thong Nguyen
Xiaobao Wu
Xinshuai Dong
Cong-Duy Nguyen
See-Kiong Ng
Anh Tuan Luu
29
7
0
05 Dec 2023
EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model
Guozhang Li
Xinpeng Ding
De-Chun Cheng
Jie Li
Nannan Wang
Xinbo Gao
32
1
0
05 Dec 2023
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Shuhuai Ren
Linli Yao
Shicheng Li
Xu Sun
Lu Hou
VLM
MLLM
23
174
0
04 Dec 2023
Vision-Language Models Learn Super Images for Efficient Partially Relevant Video Retrieval
Taichi Nishimura
Shota Nakada
Masayoshi Kondo
VLM
21
0
0
01 Dec 2023
VTimeLLM: Empower LLM to Grasp Video Moments
Bin Huang
Xin Wang
Hong Chen
Zihan Song
Wenwu Zhu
MLLM
87
113
0
30 Nov 2023
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
Pilhyeon Lee
Hyeran Byun
19
10
0
30 Nov 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
...
Jilan Xu
Guo Chen
Ping Luo
Limin Wang
Yu Qiao
VLM
MLLM
56
398
0
28 Nov 2023
Query by Activity Video in the Wild
Tao Hu
William Thong
Pascal Mettes
Cees G. M. Snoek
22
0
0
23 Nov 2023
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
WonJun Moon
Sangeek Hyun
Subeen Lee
Jae-Pil Heo
21
4
0
15 Nov 2023
Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols
Iqra Qasim
Alexander Horsch
Dilip K. Prasad
17
5
0
05 Nov 2023
Harvest Video Foundation Models via Efficient Post-Pretraining
Yizhuo Li
Kunchang Li
Yinan He
Yi Wang
Yali Wang
Limin Wang
Yu Qiao
Ping Luo
CLIP
VLM
VGen
46
2
0
30 Oct 2023
Learning Temporal Sentence Grounding From Narrated EgoVideos
Kevin Flanagan
Dima Damen
Michael Wray
23
3
0
26 Oct 2023
Exploring Iterative Refinement with Diffusion Models for Video Grounding
Xiao Liang
Tao Shi
Yaoyuan Liang
Te Tao
Shao-Lun Huang
DiffM
27
1
0
26 Oct 2023
Video Referring Expression Comprehension via Transformer with Content-conditioned Query
Jiang Ji
Meng Cao
Tengtao Song
Long Chen
Yi Wang
Yuexian Zou
19
6
0
25 Oct 2023
Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration
Piyush Singh Pasi
Karthikeya Battepati
P. Jyothi
Ganesh Ramakrishnan
T. Mahapatra
Manoj Singh
51
0
0
10 Oct 2023
GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval
Yuting Wang
Jinpeng Wang
Bin Chen
Ziyun Zeng
Shu-Tao Xia
38
8
0
08 Oct 2023
SCANet: Scene Complexity Aware Network for Weakly-Supervised Video Moment Retrieval
Sunjae Yoon
Gwanhyeong Koo
Dahyun Kim
Changdong Yoo
21
12
0
08 Oct 2023
A Hierarchical Graph-based Approach for Recognition and Description Generation of Bimanual Actions in Videos
Fatemeh Ziaeetabar
Reza Safabakhsh
S. Momtazi
M. Tamosiunaite
F. Worgotter
17
1
0
01 Oct 2023
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
13
26
0
25 Sep 2023
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges
Tongtong Yuan
Xuange Zhang
Kun Liu
Bo Liu
Chen Chen
Jian Jin
Zhenzhen Jiao
AI4TS
19
13
0
25 Sep 2023
Dual-Path Temporal Map Optimization for Make-up Temporal Video Grounding
Jiaxiu Li
Kun Li
Jia Li
Guoliang Chen
Dan Guo
Meng Wang
31
3
0
12 Sep 2023
Can I Trust Your Answer? Visually Grounded Video Question Answering
Junbin Xiao
Angela Yao
Yicong Li
Tat-Seng Chua
28
46
0
04 Sep 2023
Language-Conditioned Change-point Detection to Identify Sub-Tasks in Robotics Domains
Divyanshu Raj
Chitta Baral
N. Gopalan
72
1
0
01 Sep 2023
Previous
1
2
3
4
5
6
7
8
9
Next