ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.10711
  4. Cited By
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal
  Models for Video Question Answering

Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering

19 January 2024
Haibo Wang
Chenghang Lai
Yixuan Sun
Weifeng Ge
ArXivPDFHTML

Papers citing "Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering"

7 / 7 papers shown
Title
Frame-Voyager: Learning to Query Frames for Video Large Language Models
Frame-Voyager: Learning to Query Frames for Video Large Language Models
Sicheng Yu
Chengkai Jin
Huanyu Wang
Zhenghao Chen
Sheng Jin
...
Zhenbang Sun
Bingni Zhang
Jiawei Wu
Hao Zhang
Qianru Sun
41
5
0
04 Oct 2024
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
198
883
0
27 Apr 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
Qinghao Ye
Guohai Xu
Ming Yan
Haiyang Xu
Qi Qian
Ji Zhang
Fei Huang
VLM
AI4TS
152
69
0
30 Dec 2022
Video Graph Transformer for Video Question Answering
Video Graph Transformer for Video Question Answering
Junbin Xiao
Pan Zhou
Tat-Seng Chua
Shuicheng Yan
ViT
131
73
0
12 Jul 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
380
4,010
0
28 Jan 2022
Bridge to Answer: Structure-aware Graph Interaction Network for Video
  Question Answering
Bridge to Answer: Structure-aware Graph Interaction Network for Video Question Answering
Jungin Park
Jiyoung Lee
K. Sohn
114
99
0
29 Apr 2021
1