An Early Evaluation of GPT-4V(ision)

25 October 2023

ArXiv (abs)PDF HTML HuggingFace (22 upvotes)

Papers citing "An Early Evaluation of GPT-4V(ision)"

29 / 29 papers shown

A Multimodal, Multilingual, and Multidimensional Pipeline for Fine-grained Crowdsourcing Earthquake Damage Evaluation

171

03 Jun 2025

Supporting Preschool Emotional Development with AI-Powered RobotsInternational Conference on Interaction Design and Children (IDC), 2025

Santiago Berrezueta-Guzman

María Dolón-Poza

Stefan Wagner

108

24 May 2025

TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration

418

21 May 2025

Evaluating Compositional Scene Understanding in Multimodal Generative Models

304

29 Mar 2025

3DAxisPrompt: Promoting the 3D Grounding and Reasoning in GPT-4o

297

17 Mar 2025

Advancing Multimodal In-Context Learning in Large Vision-Language Models with Task-aware Demonstrations

Yanshu Li

404

05 Mar 2025

Introducing Visual Perception Token into Multimodal Large Language Model

322

24 Feb 2025

Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks

357

02 Oct 2024

From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis

Wei Wu

192

28 Jun 2024

GPT-4V Explorations: Mining Autonomous Driving

Zixuan Li

169

24 Jun 2024

MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Ailing Zeng

Lei Zhang

228

30 May 2024

LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding

281

27 May 2024

Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI

Gyeong-Geon Lee

Xiaoming Zhai

192

12 May 2024

A Philosophical Introduction to Language Models - Part II: The Way Forward

Raphael Milliere

Cameron Buckner

LRM

278

06 May 2024

MileBench: Benchmarking MLLMs in Long Context

Xiang Wan

350

29 Apr 2024

Constructing Multilingual Visual-Text Datasets Revealing Visual Multilingual Ability of Vision Language Models

Hidetaka Kamigaito

Taro Watanabe

233

29 Mar 2024

BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models

Wei Bi

Lingpeng Kong

LRM

286

21 Feb 2024

Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models

Xuanyu Lei

Zonghan Yang

Xinrui Chen

Peng Li

Yang Liu

MLLM LRM

298

19 Feb 2024

Progress and Opportunities of Foundation Models in Bioinformatics

215

06 Feb 2024

Developing ChatGPT for Biology and Medicine: A Complete Review of Biomedical Question AnsweringBiophysics Reports (BR), 2024

446

15 Jan 2024

DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated Content

276

16 Dec 2023

GlitchBench: Can large multimodal models detect video game glitches?Computer Vision and Pattern Recognition (CVPR), 2023

Mohammad Reza Taesiri

307

08 Dec 2023

Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs

449

24 Nov 2023

NERIF: GPT-4V for Automatic Scoring of Drawn Models

Gyeong-Geon Lee

Xiaoming Zhai

302

21 Nov 2023

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

...

Ming Yan

Ji Zhang

Jitao Sang

MLLM VLM

246

186

13 Nov 2023

GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection

Jiangning Zhang

Haoyang He

Xuhai Chen

Zhucun Xue

Yabiao Wang

Chengjie Wang

Lei Xie

Yong Liu

MLLM

267

05 Nov 2023

A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image AnalysismedRxiv (medRxiv), 2023

Yingshu Li

Yunyi Liu

Zhanyu Wang

Xinyu Liang

Lei Wang

Lingqiao Liu

Leyang Cui

320

31 Oct 2023

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

...

769

1,219

23 Jun 2023

Domain Generalization for Mammographic Image Analysis with Contrastive Learning

...

572

20 Apr 2023