ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.19046
  4. Cited By
LITA: Language Instructed Temporal-Localization Assistant

LITA: Language Instructed Temporal-Localization Assistant

27 March 2024
De-An Huang
Shijia Liao
Subhashree Radhakrishnan
Hongxu Yin
Pavlo Molchanov
Zhiding Yu
Jan Kautz
    VLM
ArXivPDFHTML

Papers citing "LITA: Language Instructed Temporal-Localization Assistant"

18 / 18 papers shown
Title
"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
"I Can See Forever!": Evaluating Real-time VideoLLMs for Assisting Individuals with Visual Impairments
Z. Zhang
Zhen Sun
Z. Zhang
Zifan Peng
Yuemeng Zhao
Z. Wang
Zeren Luo
Ruiting Zuo
Xinlei He
38
0
0
07 May 2025
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Jen-Hao Cheng
Vivian Wang
Huayu Wang
Huapeng Zhou
Yi-Hao Peng
...
Wenhao Chai
Yi-Ling Chen
Vibhav Vineet
Qin Cai
Jenq-Neng Hwang
AI4TS
35
0
0
02 May 2025
MINERVA: Evaluating Complex Video Reasoning
MINERVA: Evaluating Complex Video Reasoning
Arsha Nagrani
Sachit Menon
Ahmet Iscen
Shyamal Buch
Ramin Mehran
...
Yukun Zhu
Carl Vondrick
Mikhail Sirotenko
Cordelia Schmid
Tobias Weyand
53
0
0
01 May 2025
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Y. Liu
Kevin Qinghong Lin
C. Chen
Mike Zheng Shou
LM&Ro
LRM
62
0
0
17 Mar 2025
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Md. Mohaiminul Islam
Tushar Nagarajan
Huiyu Wang
Gedas Bertasius
Lorenzo Torresani
55
0
0
12 Mar 2025
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding
Hao Wu
Y. Yang
Shiqi Jiang
Donglin Bai
Zhibo Chen
Ting Cao
40
0
0
08 Mar 2025
TimeLogic: A Temporal Logic Benchmark for Video QA
TimeLogic: A Temporal Logic Benchmark for Video QA
S. Swetha
Hilde Kuehne
Mubarak Shah
37
1
0
13 Jan 2025
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Mingda Zhang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
96
4
0
12 Dec 2024
TimeRefine: Temporal Grounding with Time Refining Video LLM
TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Mohit Bansal
Gedas Bertasius
David J. Crandall
97
1
0
12 Dec 2024
Video LLMs for Temporal Reasoning in Long Videos
Video LLMs for Temporal Reasoning in Long Videos
Fawad Javed Fateh
Umer Ahmed
Hamza Khan
M. Zia
Quoc-Huy Tran
VLM
77
0
0
04 Dec 2024
On the Consistency of Video Large Language Models in Temporal Comprehension
On the Consistency of Video Large Language Models in Temporal Comprehension
Minjoon Jung
Junbin Xiao
Byoung-Tak Zhang
Angela Yao
83
2
0
20 Nov 2024
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Andong Deng
Tongjia Chen
Shoubin Yu
Taojiannan Yang
Lincoln Spencer
Yapeng Tian
Ajmal Saeed Mian
Mohit Bansal
Chen Chen
LRM
46
1
0
15 Nov 2024
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Xiangyu Zeng
Kunchang Li
Chenting Wang
Xinhao Li
Tianxiang Jiang
...
Zhengrong Yue
Yi Wang
Yali Wang
Yu Qiao
Limin Wang
MLLM
VLM
AI4TS
55
14
0
25 Oct 2024
X-VILA: Cross-Modality Alignment for Large Language Model
X-VILA: Cross-Modality Alignment for Large Language Model
Hanrong Ye
De-An Huang
Yao Lu
Zhiding Yu
Wei Ping
...
Jan Kautz
Song Han
Dan Xu
Pavlo Molchanov
Hongxu Yin
MLLM
VLM
34
29
0
29 May 2024
Momentor: Advancing Video Large Language Model with Fine-Grained
  Temporal Reasoning
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Long Qian
Juncheng Billy Li
Yu-hao Wu
Yaobo Ye
Hao Fei
Tat-Seng Chua
Yueting Zhuang
Siliang Tang
MLLM
LRM
57
9
0
18 Feb 2024
VTimeLLM: Empower LLM to Grasp Video Moments
VTimeLLM: Empower LLM to Grasp Video Moments
Bin Huang
Xin Wang
Hong Chen
Zihan Song
Wenwu Zhu
MLLM
72
80
0
30 Nov 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before
  Projection
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLM
MLLM
182
576
0
16 Nov 2023
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
152
280
0
14 Oct 2023
1