Evaluation of OpenAI o1: Opportunities and Challenges of AGI
- LRMAI4CEReLMELMVLM
Main:11 Pages
177 Figures
8 Tables
Appendix:270 Pages
Abstract
This comprehensive study evaluates the performance of OpenAI's o1-preview large language model across a diverse array of complex reasoning tasks, spanning multiple domains, including computer science, mathematics, natural sciences, medicine, linguistics, and social sciences. Through rigorous testing, o1-preview demonstrated remarkable capabilities, often achieving human-level or superior performance in areas ranging from coding challenges to scientific reasoning and from language processing to creative problem-solving. Key findings include:
View on arXivComments on this paper
