5.1K

v1v2v3v4v5v6 (latest)

GPT-4 Technical Report

15 March 2023

OpenAI Josh Achiam

Sandhini Agarwal

Florencia Leoni Aleman

Janko Altenschmidt

Shyamal Anadkat

Igor Babuschkin

Gabriel Bernadett-Shapiro

Christopher Berner

Lenny Bogdonoff

Anna-Luisa Brakman

Chelsea Carlson

Rory Carmichael

Hyung Won Chung

Jeremiah Currier

Tyna Eloundou

Simón Posada Fishman

Is-abella Fulford

Raphael Gontijo-Lopes

Jonathan Gordon

Morgan Grafstein

Jesse Michael Han

Johannes Heidecke

Peter Hoeschele

Brandon Houghton

I. Kanitscheider

Logan Kilpatrick

Hendrik Kirchner

Daniel Kokotajlo

Lukasz Kondraciuk

Andrew Kondrich

Aris Konstantinidis

Gretchen Krueger

Ma-teusz Litwin

Yaniv Markovski

Andrey Mishchenko

Daniel P. Mossing

Reiichiro Nakano

Arvind Neelakantan

Ashley Pantuliano

Giambattista Parascandolo

Alexandre Passos

Filipe de Avila Belbute Peres

Henrique Pondé de Oliveira Pinto

Michael Pokorny

Michelle Pokrass

Vitchyr H. Pong

Elizabeth Proehl

Aditya A. Ramesh

Cameron Raymond

Shibani Santurkar

Heather Schmidt

Benjamin D. Sokolowsky

Natalie Staudacher

Natalie Summers

Madeleine Thompson

Amin Tootoonchian

Elizabeth Tseng

Juan Felipe Cerón Uribe

Arun Vijayvergiya

Carroll L. Wainwright

Justin Jay Wang

Akila Welihinda

Wojciech Zaremba

ArXiv (abs)PDF HTML HuggingFace (6 upvotes)

Abstract

We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4.

Comments on this paper