ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.16534
  4. Cited By
An Early Evaluation of GPT-4V(ision)

An Early Evaluation of GPT-4V(ision)

25 October 2023
Yang Wu
Shilong Wang
Hao Yang
Tian Zheng
Hongbo Zhang
Yanyan Zhao
Bing Qin
    MLLMELM
ArXiv (abs)PDFHTMLHuggingFace (22 upvotes)

Papers citing "An Early Evaluation of GPT-4V(ision)"

29 / 29 papers shown
A Multimodal, Multilingual, and Multidimensional Pipeline for Fine-grained Crowdsourcing Earthquake Damage Evaluation
A Multimodal, Multilingual, and Multidimensional Pipeline for Fine-grained Crowdsourcing Earthquake Damage Evaluation
Zihui Ma
Jinkui Chi
Juan Li
Qingfeng Lan
Jingxiao Liu
Qingyuan Feng
Yuki Miura
171
0
0
03 Jun 2025
Supporting Preschool Emotional Development with AI-Powered Robots
Supporting Preschool Emotional Development with AI-Powered RobotsInternational Conference on Interaction Design and Children (IDC), 2025
Santiago Berrezueta-Guzman
María Dolón-Poza
Stefan Wagner
108
1
0
24 May 2025
TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration
TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration
Yanshu Li
Tian Yun
Tian Yun
Pinyuan Feng
Jinfa Huang
Ruixiang Tang
418
23
0
21 May 2025
Evaluating Compositional Scene Understanding in Multimodal Generative Models
Evaluating Compositional Scene Understanding in Multimodal Generative Models
Shuhao Fu
Andrew Jun Lee
Anna Wang
Ida Momennejad
Trevor Bihl
Hongjing Lu
Taylor Webb
CoGeOCL
304
3
0
29 Mar 2025
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o
3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o
Dingning Liu
Cheng Wang
Peng Gao
Renrui Zhang
Cheng Wang
Yuan Meng
Zhihui Wang
LRM
297
5
0
17 Mar 2025
Advancing Multimodal In-Context Learning in Large Vision-Language Models with Task-aware Demonstrations
Advancing Multimodal In-Context Learning in Large Vision-Language Models with Task-aware Demonstrations
Yanshu Li
404
4
0
05 Mar 2025
Introducing Visual Perception Token into Multimodal Large Language Model
Introducing Visual Perception Token into Multimodal Large Language Model
Runpeng Yu
Xinyin Ma
Xinchao Wang
MLLMLRM
322
12
0
24 Feb 2025
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Mengzhao Jia
Wenhao Yu
Kaixin Ma
Tianqing Fang
Z. Zhang
Siru Ouyang
Hongming Zhang
Meng Jiang
Dong Yu
VLM
357
12
0
02 Oct 2024
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via
  Data Synthesis
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis
Chuanqi Cheng
Jian Guan
Wei Wu
Rui Yan
LRM
192
15
0
28 Jun 2024
GPT-4V Explorations: Mining Autonomous Driving
GPT-4V Explorations: Mining Autonomous Driving
Zixuan Li
169
2
0
24 Jun 2024
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Ling-Hao Chen
Shunlin Lu
Ailing Zeng
Hao Zhang
Benyou Wang
Ruimao Zhang
Lei Zhang
228
71
0
30 May 2024
LLM-Optic: Unveiling the Capabilities of Large Language Models for
  Universal Visual Grounding
LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding
Haoyu Zhao
Wenhang Ge
Ying-Cong Chen
ObjDMLLMVLM
281
7
0
27 May 2024
Realizing Visual Question Answering for Education: GPT-4V as a
  Multimodal AI
Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI
Gyeong-Geon Lee
Xiaoming Zhai
192
23
0
12 May 2024
A Philosophical Introduction to Language Models - Part II: The Way
  Forward
A Philosophical Introduction to Language Models - Part II: The Way Forward
Raphael Milliere
Cameron Buckner
LRM
278
24
0
06 May 2024
MileBench: Benchmarking MLLMs in Long Context
MileBench: Benchmarking MLLMs in Long Context
Dingjie Song
Shunian Chen
Guiming Hardy Chen
Fei Yu
Xiang Wan
Benyou Wang
VLM
350
62
0
29 Apr 2024
Constructing Multilingual Visual-Text Datasets Revealing Visual
  Multilingual Ability of Vision Language Models
Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models
Jesse Atuhurra
Iqra Ali
Tatsuya Hiraoka
Hidetaka Kamigaito
Tomoya Iwakura
Taro Watanabe
233
1
0
29 Mar 2024
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large
  Vision-Language Models
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models
Xueliang Zhao
Xinting Huang
Tingchen Fu
Qintong Li
Shansan Gong
Lemao Liu
Wei Bi
Lingpeng Kong
LRM
286
4
0
21 Feb 2024
Scaffolding Coordinates to Promote Vision-Language Coordination in Large
  Multi-Modal Models
Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models
Xuanyu Lei
Zonghan Yang
Xinrui Chen
Peng Li
Yang Liu
MLLMLRM
298
54
0
19 Feb 2024
Progress and Opportunities of Foundation Models in Bioinformatics
Progress and Opportunities of Foundation Models in Bioinformatics
Qing Li
Zhihang Hu
Yixuan Wang
Lei Li
Yimin Fan
Irwin King
Le Song
Yu Li
AI4CE
215
40
0
06 Feb 2024
Developing ChatGPT for Biology and Medicine: A Complete Review of
  Biomedical Question Answering
Developing ChatGPT for Biology and Medicine: A Complete Review of Biomedical Question AnsweringBiophysics Reports (BR), 2024
Qing Li
Lei Li
Yu Li
LM&MAAI4MH
446
18
0
15 Jan 2024
DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated
  Content
DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated Content
Wentao Wang
Xuanyao Huang
Tianyang Wang
Swalpa Kumar Roy
EGVM
276
1
0
16 Dec 2023
GlitchBench: Can large multimodal models detect video game glitches?
GlitchBench: Can large multimodal models detect video game glitches?Computer Vision and Pattern Recognition (CVPR), 2023
Mohammad Reza Taesiri
Tianjun Feng
Anh Totti Nguyen
Cor-Paul Bezemer
MLLMVLMLRM
307
18
0
08 Dec 2023
Charting New Territories: Exploring the Geographic and Geospatial
  Capabilities of Multimodal LLMs
Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs
Jonathan Roberts
Timo Lüddecke
Rehan Sheikh
Kai Han
Samuel Albanie
MLLM
449
40
0
24 Nov 2023
NERIF: GPT-4V for Automatic Scoring of Drawn Models
NERIF: GPT-4V for Automatic Scoring of Drawn Models
Gyeong-Geon Lee
Xiaoming Zhai
302
13
0
21 Nov 2023
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination
  Evaluation
AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation
Junyang Wang
Yuhang Wang
Guohai Xu
Jing Zhang
Yukai Gu
...
Yuan Liu
Haiyang Xu
Ming Yan
Ji Zhang
Jitao Sang
MLLMVLM
246
186
0
13 Nov 2023
GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for
  Zero-shot Anomaly Detection
GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection
Jiangning Zhang
Haoyang He
Xuhai Chen
Zhucun Xue
Yabiao Wang
Chengjie Wang
Lei Xie
Yong Liu
MLLM
267
29
0
05 Nov 2023
A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical
  Image Analysis
A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image AnalysismedRxiv (medRxiv), 2023
Yingshu Li
Yunyi Liu
Zhanyu Wang
Xinyu Liang
Lei Wang
Lingqiao Liu
Leyang Cui
Zhaopeng Tu
Longyue Wang
Luping Zhou
ELMLM&MA
320
0
0
31 Oct 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu
Peixian Chen
Chunjiang Ge
Yulei Qin
Mengdan Zhang
...
Xing Sun
Zhenyu Qiu
Rongrong Ji
Caifeng Shan
Ran He
ELMMLLM
769
1,219
0
23 Jun 2023
Domain Generalization for Mammographic Image Analysis with Contrastive
  Learning
Domain Generalization for Mammographic Image Analysis with Contrastive Learning
Zheren Li
Zhiming Cui
Lichi Zhang
Sheng Wang
Chenjin Lei
...
Yajia Gu
Zaiyi Liu
Chunling Liu
Dinggang Shen
Jie‐Zhi Cheng
572
3
0
20 Apr 2023
1