ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.12474
  4. Cited By
Evaluating the Performance of Large Language Models on GAOKAO Benchmark
v1v2v3 (latest)

Evaluating the Performance of Large Language Models on GAOKAO Benchmark

21 May 2023
Xiaotian Zhang
Chun-yan Li
Yi Zong
Zhengyu Ying
Liang He
Xipeng Qiu
    ALMELM
ArXiv (abs)PDFHTML

Papers citing "Evaluating the Performance of Large Language Models on GAOKAO Benchmark"

50 / 66 papers shown
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
Fei Zhao
Chonggang Lu
Haofu Qian
Fangcheng Shi
Zijie Meng
...
Zheyong Xie
Zheyu Ye
Zhe Xu
Yao Hu
Shaosheng Cao
ALM
203
0
0
10 Nov 2025
EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMs
EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMs
Numaan Naeem
Abdellah El Mekki
Muhammad Abdul-Mageed
AI4EdELM
242
0
0
20 Oct 2025
Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models
Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models
Kedi Chen
Zhikai Lei
Xu Guo
Xuecheng Wu
Siyuan Zeng
...
J. Zhou
Liang He
Qipeng Guo
Kai Chen
Wei-na Zhang
AIMatAI4TSLRM
325
0
0
16 Oct 2025
FinDeepResearch: Evaluating Deep Research Agents in Rigorous Financial Analysis
FinDeepResearch: Evaluating Deep Research Agents in Rigorous Financial Analysis
Fengbin Zhu
Xiang Yao Ng
Ziyang Liu
Chang Liu
Xianwei Zeng
...
Fuli Feng
Richang Hong
Huanbo Luan
Ke-Wei Huang
Tat-Seng Chua
AIFin
225
2
0
15 Oct 2025
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math
Shrey Pandit
Austin Xu
Xuan-Phi Nguyen
Yifei Ming
Caiming Xiong
Shafiq Joty
LRM
183
3
0
15 Oct 2025
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
Zhenxin Lei
Zhangwei Gao
Changyao Tian
Erfei Cui
Guanzhou Chen
...
Xiangyu Zhao
Jiayi Ji
Yu Qiao
Wenhai Wang
Gen Luo
VLM
248
0
0
14 Oct 2025
Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning
Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning
Zhiwen Ruan
Yixia Li
He Zhu
Yun Chen
P. Li
Yang Liu
Guanhua Chen
LRM
136
5
0
13 Oct 2025
SwarmSys: Decentralized Swarm-Inspired Agents for Scalable and Adaptive Reasoning
SwarmSys: Decentralized Swarm-Inspired Agents for Scalable and Adaptive Reasoning
Ruohao Li
Hongjun Liu
Leyi Zhao
Zisu Li
Jiawei Li
Jiajun Jiang
Linning Xu
Chen Zhao
Mingming Fan
Chen Liang
LLMAGLRM
141
0
0
11 Oct 2025
Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
Yiran Shen
Yu Xia
Jonathan D. Chang
Prithviraj Ammanabrolu
160
0
0
01 Oct 2025
Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
Haizhong Zheng
Jiawei Zhao
Bedi Chen
OffRL
154
5
0
01 Oct 2025
Atomic Thinking of LLMs: Decoupling and Exploring Mathematical Reasoning Abilities
Atomic Thinking of LLMs: Decoupling and Exploring Mathematical Reasoning Abilities
Jiayi Kuang
Haojing Huang
Yinghui Li
Xinnian Liang
Zhikun Xu
...
Xiaoyu Tan
Chao Qu
Meishan Zhang
Ying Shen
Philip S. Yu
LRM
171
5
0
30 Sep 2025
PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning
PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning
Hengbo Xiao
Jingyuan Fan
Xin Tong
Jingzhao Zhang
Chao Lu
Guannan He
MoE
188
0
0
17 Sep 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Weiyun Wang
Zhangwei Gao
Lixin Gu
Hengjun Pu
Long Cui
...
Bowen Zhou
Kai Chen
Yu Qiao
Wenhai Wang
Gen Luo
MLLMLRM
304
279
0
25 Aug 2025
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR
Xiao Liang
Zhongzhi Li
Yeyun Gong
Yelong Shen
Y. Wu
Zhijiang Guo
Weizhu Chen
LRM
236
24
0
19 Aug 2025
From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation
From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation
Chengliang Zhou
Mei Wang
Ting Zhang
Qiannan Zhu
Jian Li
Hua Huang
AI4EdELM
230
1
0
05 Aug 2025
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
Shudong Liu
Hongwei Liu
Junnan Liu
Linchen Xiao
Songyang Gao
...
Yuzhe Gu
Wenwei Zhang
Yang Li
Songyang Zhang
Kai Chen
151
15
0
05 Aug 2025
Technical Report of TeleChat2, TeleChat2.5 and T1
Technical Report of TeleChat2, TeleChat2.5 and T1
Zihan Wang
Xinzhang Liu
Yitong Yao
Chao Wang
Yu Zhao
...
Bingkai Yang
Shuangyong Song
Yongxiang Li
Zhongjiang He
Xuelong Li
AI4TSLRM
426
6
0
24 Jul 2025
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
Changxin Tian
Kunlong Chen
Jia-Ling Liu
Ziqi Liu
Zhiqiang Zhang
Jun Zhou
MoE
385
12
0
23 Jul 2025
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
Changxin Tian
Jiapeng Wang
Qian Zhao
Kunlong Chen
Jia-Ling Liu
Ziqi Liu
Jiaxin Mao
Wayne Xin Zhao
Zhiqiang Zhang
Jun Zhou
MoMeCLL
259
6
0
23 Jul 2025
RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services
RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services
Fei Zhao
Chonggang Lu
Yue Wang
Zheyong Xie
Ziyan Liu
...
Jun Fan
Xiaolong Jiang
Weiting Liu
Boyang Wang
Shaosheng Cao
ALM
219
0
0
13 Jul 2025
MinosEval: Distinguishing Factoid and Non-Factoid for Tailored Open-Ended QA Evaluation with LLMs
MinosEval: Distinguishing Factoid and Non-Factoid for Tailored Open-Ended QA Evaluation with LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yongqi Fan
Yating Wang
Guandong Wang
Jie Zhai
Jingping Liu
Qi Ye
Tong Ruan
162
0
0
18 Jun 2025
Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal LogicWorkshop on Innovative Use of NLP for Building Educational Applications (UNBEA), 2025
Zhenjiang Mao
Artem Bisliouk
Rohith Reddy Nama
Ivan Ruchkin
ReLMLRM
170
6
0
09 Jun 2025
VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs
VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs
Can Li
Ting Zhang
Ting Zhang
Mei Wang
Hua Huang
LRM
219
4
0
07 Jun 2025
STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent Framework
STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent FrameworkAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Wenhao Liu
Zhenyi Lu
Xinyu Hu
Jierui Zhang
Dailin Li
...
Pei Zhang
Chengbo Zhang
Yuxiang Ren
Xiaohong Huang
Yan Ma
OffRL
300
4
0
02 Jun 2025
From Objectives to Questions: A Planning-based Framework for Educational Mathematical Question Generation
From Objectives to Questions: A Planning-based Framework for Educational Mathematical Question GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Cheng Cheng
Z. Huang
Guanhao Zhao
Yuxiang Guo
Xin Lin
J. Wu
Xin Li
Shijin Wang
236
1
0
01 Jun 2025
Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting
Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting
Yifan Wu
Jingze Shi
Yiran Peng
Jiayi Zhang
Xiaotian Lin
Nan Tang
Yuyu Luo
LRM
292
9
0
26 May 2025
Assessing the Capability of LLMs in Solving POSCOMP Questions
Assessing the Capability of LLMs in Solving POSCOMP Questions
Cayo Viegas
Rohit Gheyi
Márcio Ribeiro
ELM
102
1
0
24 May 2025
T$^2$: An Adaptive Test-Time Scaling Strategy for Contextual Question Answering
T2^22: An Adaptive Test-Time Scaling Strategy for Contextual Question Answering
Zhengyi Zhao
Shubo Zhang
Zezhong Wang
Huimin Wang
Yutian Zhao
Bin Liang
Yefeng Zheng
Binyang Li
Kam-Fai Wong
X. Wu
LRM
298
2
0
23 May 2025
TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning
TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning
Jinyang Wu
Chonghua Liao
Mingkuan Feng
Shuai Zhang
Zhengqi Wen
Pengpeng Shao
Huazhe Xu
Jianhua Tao
Jianhua Tao
OffRLLRM
547
13
0
21 May 2025
Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization
Rethinking Reward Model Evaluation Through the Lens of Reward OveroptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Sunghwan Kim
Dongjin Kang
Taeyoon Kwon
Hyungjoo Chae
Dongha Lee
Jinyoung Yeo
ALM
332
2
0
19 May 2025
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models
Peichao Lai
Jianchao Tan
Yi Lin
Lingling Zhang
Feiyang Ye
...
Zifei Shan
Bin Wang
Longji Xu
Wentao Zhang
Bin Cui
ELMLRM
479
0
0
12 May 2025
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation
Mengze Hong
Wailing Ng
Chen Zhang
Chen Zhang
ELM
310
7
0
08 May 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLMVLM
613
806
1
14 Apr 2025
Can the capability of Large Language Models be described by human ability? A Meta Study
Can the capability of Large Language Models be described by human ability? A Meta Study
Mingrui Zan
Yunquan Zhang
Boyang Zhang
Fangming Liu
Daning Cheng
ELMLM&MA
256
1
0
13 Apr 2025
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
Shihong Deng
Dongbin Zhao
LRM
535
13
0
17 Mar 2025
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
Bo Liu
Yunxiang Li
Yangqiu Song
Hanjing Wang
Linyi Yang
...
Jun Wang
Jun Wang
Weinan Zhang
Shuyue Hu
Ying Wen
LLMAGKELMLRMAI4CE
515
35
0
12 Mar 2025
MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems
MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific ProblemsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Xinwu Ye
Chengfan Li
Siming Chen
Xiangru Tang
Wei Wei
LRM
300
1
0
27 Feb 2025
Baichuan-M1: Pushing the Medical Capability of Large Language Models
Binghai Wang
Haizhou Zhao
Huozhi Zhou
Liang Song
Mingyu Xu
...
Yan Zhang
Yifei Duan
Yuyan Zhou
Zhi-Ming Ma
Zhikai Wu
LM&MAELMAI4MH
384
32
0
18 Feb 2025
Improving Natural Language Understanding for LLMs via Large-Scale Instruction Synthesis
Improving Natural Language Understanding for LLMs via Large-Scale Instruction SynthesisAAAI Conference on Artificial Intelligence (AAAI), 2025
Lin Yuan
Jun Xu
Honghao Gui
Mengshu Sun
Qing Cui
Lei Liang
Jun Zhou
AI4CE
842
2
0
06 Feb 2025
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
Xin Xu
Qiyun Xu
Tong Xiao
Tianhao Chen
Yuchen Yan
Jiaxin Zhang
Boyao Wang
Can Yang
Yang Wang
LRMAI4CEELM
809
23
0
01 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
Qingbin Liu
Tao Zhang
Tao Zhang
Tian Jin
...
Jianhua Xu
Haoze Sun
Mingan Lin
Guosheng Dong
Xin Wu
AuLLM
328
66
0
28 Jan 2025
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Haotian Luo
Li Shen
Haiying He
Yun Wang
Shiwei Liu
Wei Li
Naiqiang Tan
Xiaochun Cao
Dacheng Tao
VLMLRM
524
185
0
22 Jan 2025
Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language ModelsJournal of Artificial Intelligence Research (JAIR), 2025
Kaleem Ullah Qasim
Jiashu Zhang
Tariq Alsahfi
Ateeq Ur Rehman Butt
LRMReLM
309
4
0
03 Jan 2025
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
524
184
1
15 Nov 2024
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts
Bo Yang
Qingping Yang
Runtao Liu
Runtao Liu
LRMReLMELMAIMat
394
7
0
11 Nov 2024
Number Cookbook: Number Understanding of Language Models and How to Improve It
Number Cookbook: Number Understanding of Language Models and How to Improve ItInternational Conference on Learning Representations (ICLR), 2024
Haotong Yang
Yi Hu
Shijia Kang
Zhouchen Lin
Muhan Zhang
LRM
498
31
0
06 Nov 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Liwen Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
533
14
0
24 Oct 2024
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language ModelsThe Web Conference (WWW), 2024
Peiyi Zhang
Yazhou Zhang
Bo Wang
Lu Rong
Jing Qin
Jing Qin
AI4EdELM
373
6
0
19 Sep 2024
See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering
  LLM Weaknesses
See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses
Yulong Chen
Yang Liu
Jianhao Yan
X. Bai
Ming Zhong
Yinghao Yang
Ziyi Yang
Chenguang Zhu
Yue Zhang
ALMELM
205
18
0
16 Aug 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang
Zengzhi Wang
Shijie Xia
Xuefeng Li
Haoyang Zou
...
Yuxiang Zheng
Shaoting Zhang
Dahua Lin
Yu Qiao
Pengfei Liu
ELMLRM
299
72
0
18 Jun 2024
12
Next