ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.05662
  4. Cited By
InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT
  Beyond Language

InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language

9 May 2023
Zhaoyang Liu
Yinan He
Wenhai Wang
Weiyun Wang
Yi Wang
Shoufa Chen
Qing-Long Zhang
Zeqiang Lai
Yang Yang
Qingyun Li
Jiashuo Yu
Kunchang Li
Zhe Chen
Xuecheng Yang
Xizhou Zhu
Yali Wang
Limin Wang
Ping Luo
Jifeng Dai
Yu Qiao
    LRM
    MLLM
ArXivPDFHTML

Papers citing "InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language"

25 / 25 papers shown
Title
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
Think Before You Segment: High-Quality Reasoning Segmentation with GPT Chain of Thoughts
Shiu-hong Kao
Yu-Wing Tai
Chi-Keung Tang
LRM
MLLM
52
0
0
10 Mar 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
94
48
0
03 Jan 2025
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
Chunlin Yu
Hanqing Wang
Ye Shi
Haoyang Luo
Sibei Yang
Jingyi Yu
Jingya Wang
LRM
LM&Ro
79
1
0
02 Dec 2024
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu
J. Wang
Yuan Meng
Yanning Zhang
Le Sun
Zhi Wang
156
0
0
25 Nov 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
57
46
1
15 Nov 2024
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Shehan Munasinghe
Hanan Gani
Wenqi Zhu
Jiale Cao
Eric P. Xing
F. Khan
Salman Khan
MLLM
VGen
VLM
44
6
0
07 Nov 2024
ViLLa: Video Reasoning Segmentation with Large Language Model
ViLLa: Video Reasoning Segmentation with Large Language Model
Rongkun Zheng
Lu Qi
Xi Chen
Yi Wang
Kun Wang
Yu Qiao
Hengshuang Zhao
VOS
LRM
52
2
0
18 Jul 2024
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and
  Editing
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Zhenyu Wang
Aoxue Li
Zhenguo Li
Xihui Liu
MLLM
DiffM
41
25
0
08 Jul 2024
ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing
ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing
Ying Jin
Pengyang Ling
Xiao-wen Dong
Pan Zhang
Jiaqi Wang
Dahua Lin
24
2
0
18 May 2024
HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision
HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision
Siddhant Bansal
Michael Wray
Dima Damen
31
3
0
15 Apr 2024
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual
  Perception
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Junyang Wang
Haiyang Xu
Jiabo Ye
Mingshi Yan
Weizhou Shen
Ji Zhang
Fei Huang
Jitao Sang
24
103
0
29 Jan 2024
LISA++: An Improved Baseline for Reasoning Segmentation with Large
  Language Model
LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model
Senqiao Yang
Tianyuan Qu
Xin Lai
Zhuotao Tian
Bohao Peng
Shu-Lin Liu
Jiaya Jia
VLM
21
28
0
28 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose
  Coarse-to-Fine Vision-Language Model
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
38
29
0
19 Dec 2023
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models
Shehan Munasinghe
Rusiru Thushara
Muhammad Maaz
H. Rasheed
Salman Khan
Mubarak Shah
Fahad Khan
VLM
MLLM
17
34
0
22 Nov 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
83
224
0
07 Jul 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
32
89
0
14 May 2023
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
245
1,071
0
05 Oct 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
295
4,077
0
24 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,236
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
306
11,909
0
04 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
390
4,124
0
28 Jan 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,448
0
28 Jan 2022
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
263
3,622
0
24 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
278
1,981
0
09 Feb 2021
Rethinking Rotated Object Detection with Gaussian Wasserstein Distance
  Loss
Rethinking Rotated Object Detection with Gaussian Wasserstein Distance Loss
Xue Yang
Junchi Yan
Qi Ming
Wentao Wang
Xiaopeng Zhang
Qi Tian
106
398
0
28 Jan 2021
1