Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.12753
Cited By
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
18 June 2024
Zhen Huang
Zengzhi Wang
Shijie Xia
Xuefeng Li
Haoyang Zou
Ruijie Xu
Run-Ze Fan
Lyumanshan Ye
Ethan Chern
Yixin Ye
Yikai Zhang
Yuqing Yang
Ting Wu
Binjie Wang
Shichao Sun
Yang Xiao
Yiyuan Li
Fan Zhou
Steffi Chern
Yiwei Qin
Yan Ma
Jiadi Su
Yixiu Liu
Yuxiang Zheng
Shaoting Zhang
Dahua Lin
Yu Qiao
Pengfei Liu
ELM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"
22 / 22 papers shown
Title
Crosslingual Reasoning through Test-Time Scaling
Zheng-Xin Yong
Muhammad Farid Adilazuarda
Jonibek Mansurov
Ruochen Zhang
Niklas Muennighoff
Carsten Eickhoff
Genta Indra Winata
Julia Kreutzer
Stephen H. Bach
Alham Fikri Aji
LRM
ELM
38
0
0
08 May 2025
Turing Machine Evaluation for Large Language Model
Haitao Wu
Zongbo Han
Huaxi Huang
Changqing Zhang
ELM
LRM
59
0
0
29 Apr 2025
Evaluating Multi-Hop Reasoning in Large Language Models: A Chemistry-Centric Case Study
Mohammad Khodadad
Ali Shiraee Kasmaee
Mahdi Astaraki
Nicholas Sherck
H. Mahyar
Soheila Samiee
LRM
27
0
0
23 Apr 2025
Rethinking the Generation of High-Quality CoT Data from the Perspective of LLM-Adaptive Question Difficulty Grading
Qianjin Yu
Keyu Wu
Zihan Chen
Chushu Zhang
Manlin Mei
Lingjun Huang
Fang Tan
Yongsheng Du
Kunlin Liu
Yurui Zhu
ELM
LRM
41
0
0
16 Apr 2025
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Can Jin
Hongwu Peng
Qixin Zhang
Yujin Tang
Dimitris N. Metaxas
Tong Che
LLMAG
LRM
49
2
0
14 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELM
LRM
35
1
0
01 Apr 2025
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
Jiaxin Ai
Pengfei Zhou
Zhaopan Xu
Ming Li
Fanrui Zhang
...
Jianwen Sun
Yukang Feng
Baojin Huang
Zhongyuan Wang
K. Zhang
ELM
51
0
0
09 Mar 2025
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
Ting-Ruen Wei
Haowei Liu
Xuyang Wu
Yi Fang
LRM
AI4CE
ReLM
KELM
101
1
0
21 Feb 2025
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
Xin Xu
Qiyun Xu
Tong Xiao
Tianhao Chen
Yuchen Yan
Jiaxin Zhang
Shizhe Diao
Can Yang
Yang Wang
ELM
LRM
AI4CE
68
2
0
01 Feb 2025
Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap
Hyunwoo Ko
Guijin Son
Dasol Choi
RALM
LRM
50
7
0
05 Jan 2025
Guiding Through Complexity: What Makes Good Supervision for Hard Math Reasoning Tasks?
Xuan He
Da Yin
Nanyun Peng
LRM
26
0
0
27 Oct 2024
Subtle Errors Matter: Preference Learning via Error-injected Self-editing
Kaishuai Xu
Tiezheng YU
Wenjun Hou
Yi Cheng
Chak Tou Leong
Liangyou Li
Xin Jiang
Lifeng Shang
Qun Liu
Wenjie Li
LRM
47
0
0
09 Oct 2024
Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks
Xingxuan Li
Weiwen Xu
Ruochen Zhao
Fangkai Jiao
Shafiq R. Joty
Lidong Bing
LRM
24
4
0
02 Oct 2024
Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization
Mucong Ding
Chenghao Deng
Jocelyn Choo
Zichu Wu
Aakriti Agrawal
...
Tianyi Zhou
Tom Goldstein
John Langford
Anima Anandkumar
Furong Huang
29
1
0
27 Sep 2024
LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models
Jiayi Gui
Yiming Liu
Jiale Cheng
Xiaotao Gu
Xiao-Yang Liu
Hongning Wang
Yuxiao Dong
Jie Tang
Minlie Huang
ELM
LLMAG
LRM
32
2
0
28 Aug 2024
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model
Wenqi Zhang
Zhenglin Cheng
Yuanyu He
Mengna Wang
Yongliang Shen
...
Guiyang Hou
Mingqian He
Yanna Ma
Weiming Lu
Yueting Zhuang
SyDa
43
9
0
09 Jul 2024
Benchmarking Benchmark Leakage in Large Language Models
Ruijie Xu
Zengzhi Wang
Run-Ze Fan
Pengfei Liu
47
42
0
29 Apr 2024
Reformatted Alignment
Run-Ze Fan
Xuefeng Li
Haoyang Zou
Junlong Li
Shwai He
Ethan Chern
Jiewen Hu
Pengfei Liu
35
5
0
19 Feb 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
126
895
0
21 Dec 2023
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
189
614
0
20 May 2021
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
274
882
0
18 Apr 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,003
0
20 Apr 2018
1