ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.06071
  4. Cited By
GroundingGPT:Language Enhanced Multi-modal Grounding Model

GroundingGPT:Language Enhanced Multi-modal Grounding Model

11 January 2024
Zhaowei Li
Qi Xu
Dong Zhang
Hang Song
Yiqing Cai
Qi Qi
Ran Zhou
Junting Pan
Zefeng Li
Van Tu Vu
Zhida Huang
Tao Wang
ArXivPDFHTML

Papers citing "GroundingGPT:Language Enhanced Multi-modal Grounding Model"

33 / 33 papers shown
Title
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Jen-Hao Cheng
Vivian Wang
Huayu Wang
Huapeng Zhou
Yi-Hao Peng
...
Wenhao Chai
Yi-Ling Chen
Vibhav Vineet
Qin Cai
Jenq-Neng Hwang
AI4TS
35
0
0
02 May 2025
Visual Intention Grounding for Egocentric Assistants
Visual Intention Grounding for Egocentric Assistants
Pengzhan Sun
Junbin Xiao
Tze Ho Elden Tse
Yicong Li
Arjun Akula
Angela Yao
EgoV
40
0
0
18 Apr 2025
The Zero Body Problem: Probing LLM Use of Sensory Language
The Zero Body Problem: Probing LLM Use of Sensory Language
Rebecca M. M. Hicke
Sil Hamilton
David M. Mimno
21
0
0
08 Apr 2025
Enhanced Cross-modal 3D Retrieval via Tri-modal Reconstruction
Enhanced Cross-modal 3D Retrieval via Tri-modal Reconstruction
Junlong Ren
Hao Wang
36
0
0
02 Apr 2025
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie
Tengda Han
Max Bain
Arsha Nagrani
Eshika Khandelwal
Gül Varol
Weidi Xie
Andrew Zisserman
DiffM
VGen
55
0
0
01 Apr 2025
BeMERC: Behavior-Aware MLLM-based Framework for Multimodal Emotion Recognition in Conversation
BeMERC: Behavior-Aware MLLM-based Framework for Multimodal Emotion Recognition in Conversation
Yumeng Fu
Junjie Wu
Zhongjie Wang
Meishan Zhang
Yulin Wu
Bingquan Liu
39
0
0
31 Mar 2025
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Henghui Du
Guangyao Li
Chang Zhou
Chunjie Zhang
Alan Zhao
D. Hu
49
0
0
17 Mar 2025
LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs
Hanyu Zhou
Gim Hee Lee
37
0
0
10 Mar 2025
Exploring Multimodal Perception in Large Language Models Through Perceptual Strength Ratings
Jonghyun Lee
Dojun Park
Jiwoo Lee
Hoekeon Choi
Sung-Eun Lee
53
1
0
10 Mar 2025
PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos
Kangda Wei
Zhengyu Zhou
Bingqing Wang
Jun Araki
Lukas Lange
Ruihong Huang
Z. Feng
32
0
0
28 Feb 2025
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
X. J. Yang
J. Liu
Peng Wang
Guoqing Wang
Y. Yang
H. Shen
ObjD
74
0
0
27 Feb 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjD
VLM
106
2
0
14 Jan 2025
Towards Visual Grounding: A Survey
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
39
3
0
31 Dec 2024
TimeRefine: Temporal Grounding with Time Refining Video LLM
TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Mohit Bansal
Gedas Bertasius
David J. Crandall
94
1
0
12 Dec 2024
ReWind: Understanding Long Videos with Instructed Learnable Memory
ReWind: Understanding Long Videos with Instructed Learnable Memory
Anxhelo Diko
Tinghuai Wang
Wassim Swaileh
Shiyan Sun
Ioannis Patras
KELM
VLM
73
0
0
23 Nov 2024
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment
  in Multi-Modal Models
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models
Wei Wang
Z. Li
Qi Xu
Linfeng Li
Yiqing Cai
Botian Jiang
Hang Song
Xingcan Hu
Pengyu Wang
Li Xiao
24
1
0
14 Nov 2024
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning
Xiangyu Zeng
Kunchang Li
Chenting Wang
Xinhao Li
Tianxiang Jiang
...
Zhengrong Yue
Yi Wang
Yali Wang
Yu Qiao
Limin Wang
MLLM
VLM
AI4TS
55
14
0
25 Oct 2024
The Curse of Multi-Modalities: Evaluating Hallucinations of Large
  Multimodal Models across Language, Visual, and Audio
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Sicong Leng
Yun Xing
Zesen Cheng
Yang Zhou
Hang Zhang
Xin Li
Deli Zhao
Shijian Lu
Chunyan Miao
Lidong Bing
22
5
0
16 Oct 2024
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of
  MLLMs
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
Yunqiu Xu
Linchao Zhu
Yi Yang
13
3
0
16 Oct 2024
OMCAT: Omni Context Aware Transformer
OMCAT: Omni Context Aware Transformer
Arushi Goel
Karan Sapra
Matthieu Le
Rafael Valle
Andrew Tao
Bryan Catanzaro
MLLM
VLM
16
0
0
15 Oct 2024
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for
  Embodied AI
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI
Sijie Cheng
Kechen Fang
Yangyang Yu
Sicheng Zhou
B. Li
Ye Tian
Tingguang Li
Lei Han
Yang Janet Liu
37
8
0
15 Oct 2024
Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality
  Assessment
Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment
Kai Liu
Ziqing Zhang
Wenbo Li
Renjing Pei
Fenglong Song
Xiaohong Liu
Linghe Kong
Yulun Zhang
VLM
26
0
0
03 Oct 2024
SimVG: A Simple Framework for Visual Grounding with Decoupled
  Multi-modal Fusion
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
Ming Dai
Lingfeng Yang
Yihao Xu
Zhenhua Feng
Wankou Yang
ObjD
19
9
0
26 Sep 2024
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Junzhuo Liu
X. Yang
Weiwei Li
Peng Wang
ObjD
31
3
0
23 Sep 2024
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks
  With Large Language Model
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
Zhaowei Li
Wei Wang
Yiqing Cai
Xu Qi
Pengyu Wang
Dong Zhang
Hang Song
Botian Jiang
Zhida Huang
Tao Wang
AIFin
LRM
27
3
0
05 Aug 2024
A Comprehensive Review of Multimodal Large Language Models: Performance
  and Challenges Across Different Tasks
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks
Jiaqi Wang
Hanqi Jiang
Yi-Hsueh Liu
Chong Ma
Xu-Yao Zhang
...
Xin Zhang
Wei Zhang
Dinggang Shen
Tianming Liu
Shu Zhang
VLM
AI4TS
29
18
0
02 Aug 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey
  from Co-Development Perspective
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
43
5
0
11 Jul 2024
Temporal Grounding of Activities using Multimodal Large Language Models
Temporal Grounding of Activities using Multimodal Large Language Models
Young Chol Song
22
0
0
30 May 2024
Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation
Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation
Zicheng Zhang
Tong Zhang
Yi Zhu
Jian-zhuo Liu
Xiaodan Liang
QiXiang Ye
Wei Ke
VLM
31
2
0
13 Mar 2024
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
47
76
0
29 Dec 2023
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang
Hongyang Li
Feng Li
Tianhe Ren
Xueyan Zou
...
Shijia Huang
Jianfeng Gao
Lei Zhang
Chun-yue Li
Jianwei Yang
87
68
0
05 Dec 2023
VTimeLLM: Empower LLM to Grasp Video Moments
VTimeLLM: Empower LLM to Grasp Video Moments
Bin Huang
Xin Wang
Hong Chen
Zihan Song
Wenwu Zhu
MLLM
64
80
0
30 Nov 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
1