ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.16821
  4. Cited By
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal
  Models with Open-Source Suites

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

25 April 2024
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
Erfei Cui
Wenwen Tong
Kongzhi Hu
Jiapeng Luo
Zheng Ma
Ji Ma
Jiaqi Wang
Xiao-wen Dong
Hang Yan
Hewei Guo
Conghui He
Botian Shi
Zhenjiang Jin
Chaochao Xu
Bin Wang
Xingjian Wei
Wei Li
Wenjian Zhang
Bo-Wen Zhang
Pinlong Cai
Licheng Wen
Xiangchao Yan
Min Dou
Lewei Lu
Xizhou Zhu
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
    MLLM
    VLM
ArXivPDFHTML

Papers citing "How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites"

10 / 110 papers shown
Title
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Haoran Wei
Lingyu Kong
Jinyue Chen
Liang Zhao
Zheng Ge
Jinrong Yang
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
MLLM
VLM
61
39
0
11 Dec 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with
  Modality Collaboration
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
114
367
0
07 Nov 2023
ControlLLM: Augment Language Models with Tools by Searching on Graphs
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu
Zeqiang Lai
Zhangwei Gao
Erfei Cui
Ziheng Li
...
Lewei Lu
Qifeng Chen
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
123
20
0
26 Oct 2023
UReader: Universal OCR-free Visually-situated Language Understanding
  with Multimodal Large Language Model
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Jiabo Ye
Anwen Hu
Haiyang Xu
Qinghao Ye
Mingshi Yan
...
Ji Zhang
Qin Jin
Liang He
Xin Lin
Feiyan Huang
VLM
MLLM
118
83
0
08 Oct 2023
WanJuan: A Comprehensive Multimodal Dataset for Advancing English and
  Chinese Large Models
WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models
Conghui He
Zhenjiang Jin
Chaoxi Xu
Jiantao Qiu
Bin Wang
Wei Li
Hang Yan
Jiaqi Wang
Da Lin
56
32
0
21 Aug 2023
VideoLLM: Modeling Video Sequence with Large Language Models
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
89
51
0
22 May 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
198
1,089
0
20 Sep 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in
  Natural Images
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
169
458
0
26 Jan 2016
Previous
123