ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large Language Models

1 October 2024

Xiaodong Chen

ArXiv (abs)PDF HTML Github

Papers citing "ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large Language Models"

23 / 23 papers shown

TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning

333

03 Dec 2025

Learning to Refuse: Refusal-Aware Reinforcement Fine-Tuning for Hard-Irrelevant Queries in Video Temporal Grounding

240

28 Nov 2025

VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations

...

193

27 Oct 2025

Enrich and Detect: Video Temporal Grounding with Multimodal LLMs

Triantafyllos Afouras

277

19 Oct 2025

SVAG-Bench: A Large-Scale Benchmark for Multi-Instance Spatio-temporal Video Action Grounding

349

14 Oct 2025

Video-in-the-Loop: Span-Grounded Long Video QA with Interleaved Reasoning

...

333

05 Oct 2025

OVG-HQ: Online Video Grounding with Hybrid-modal Queries

190

16 Aug 2025

Empowering Multimodal LLMs with External Tools: A Comprehensive Survey

251

14 Aug 2025

A Survey on Video Temporal Grounding with Multimodal Large Language ModelIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025

175

07 Aug 2025

DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs

An-Zi Yen

409

13 Jun 2025

DisTime: Distribution-based Time Representation for Video Large Language Models

293

30 May 2025

MotionPro: A Precise Motion Controller for Image-to-Video GenerationComputer Vision and Pattern Recognition (CVPR), 2025

437

26 May 2025

Object-Shot Enhanced Grounding Network for Egocentric VideoComputer Vision and Pattern Recognition (CVPR), 2025

316

07 May 2025

TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action

...

942

02 May 2025

Ask2Loc: Learning to Locate Instructional Visual Answers by Asking Questions

1.1K

22 Apr 2025

Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation

489

01 Apr 2025

VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning

1.1K

17 Mar 2025

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling

...

687

154

21 Jan 2025

T-SVG: Text-Driven Stereoscopic Video Generation

342

12 Dec 2024

TimeRefine: Temporal Grounding with Time Refining Video LLM

606

12 Dec 2024

TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability

372

27 Nov 2024

ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long VideosComputer Vision and Pattern Recognition (CVPR), 2024

268

22 Nov 2024

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded TuningInternational Conference on Learning Representations (ICLR), 2024

...

Yali Wang

357

25 Oct 2024