EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video
Grounding with Multimodal Large Language Model

EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model

5 December 2023

Jie Li

Nannan Wang

Xinbo Gao

Papers citing "EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model"

6 / 6 papers shown

Title
Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models Xinpeng Ding Jinahua Han Hang Xu Xiaodan Liang Wei Zhang Xiaomeng Li 15 38 0 02 Jan 2024
VideoLLM: Modeling Video Sequence with Large Language Models Guo Chen Yin-Dong Zheng Jiahao Wang Jilan Xu Yifei Huang ... Yi Wang Yali Wang Yu Qiao Tong Lu Limin Wang MLLM 92 76 0 22 May 2023
Boosting Weakly-Supervised Temporal Action Localization with Text Information Guozhang Li De-Chun Cheng Xinpeng Ding N. Wang Xiaoyu Wang Xinbo Gao 23 21 0 01 May 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models Junnan Li Dongxu Li Silvio Savarese Steven C. H. Hoi VLM MLLM 244 4,186 0 30 Jan 2023
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation Junnan Li Dongxu Li Caiming Xiong S. Hoi MLLM BDL VLM CLIP 385 4,010 0 28 Jan 2022
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision Chao Jia Yinfei Yang Ye Xia Yi-Ting Chen Zarana Parekh Hieu H. Pham Quoc V. Le Yun-hsuan Sung Zhen Li Tom Duerig VLM CLIP 293 3,683 0 11 Feb 2021