Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2305.12474
Cited By
v1
v2
v3 (latest)
Evaluating the Performance of Large Language Models on GAOKAO Benchmark
21 May 2023
Xiaotian Zhang
Chun-yan Li
Yi Zong
Zhengyu Ying
Liang He
Xipeng Qiu
ALM
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Evaluating the Performance of Large Language Models on GAOKAO Benchmark"
50 / 66 papers shown
Title
RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services
Fei Zhao
Chonggang Lu
Haofu Qian
Fangcheng Shi
Zijie Meng
...
Zheyong Xie
Zheyu Ye
Zhe Xu
Yao Hu
Shaosheng Cao
ALM
199
0
0
10 Nov 2025
EduAdapt: A Question Answer Benchmark Dataset for Evaluating Grade-Level Adaptability in LLMs
Numaan Naeem
Abdellah El Mekki
Muhammad Abdul-Mageed
AI4Ed
ELM
230
0
0
20 Oct 2025
Code-driven Number Sequence Calculation: Enhancing the inductive Reasoning Abilities of Large Language Models
Kedi Chen
Zhikai Lei
Xu Guo
Xuecheng Wu
Siyuan Zeng
...
J. Zhou
Liang He
Qipeng Guo
Kai Chen
Wei-na Zhang
AIMat
AI4TS
LRM
312
0
0
16 Oct 2025
FinDeepResearch: Evaluating Deep Research Agents in Rigorous Financial Analysis
Fengbin Zhu
Xiang Yao Ng
Ziyang Liu
Chang Liu
Xianwei Zeng
...
Fuli Feng
Richang Hong
Huanbo Luan
Ke-Wei Huang
Tat-Seng Chua
AIFin
208
0
0
15 Oct 2025
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math
Shrey Pandit
Austin Xu
Xuan-Phi Nguyen
Yifei Ming
Caiming Xiong
Shafiq Joty
LRM
176
2
0
15 Oct 2025
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites
Zhenxin Lei
Zhangwei Gao
Changyao Tian
Erfei Cui
Guanzhou Chen
...
Xiangyu Zhao
Jiayi Ji
Yu Qiao
Wenhai Wang
Gen Luo
VLM
245
0
0
14 Oct 2025
Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning
Zhiwen Ruan
Yixia Li
He Zhu
Yun Chen
P. Li
Yang Liu
Guanhua Chen
LRM
119
1
0
13 Oct 2025
SwarmSys: Decentralized Swarm-Inspired Agents for Scalable and Adaptive Reasoning
Ruohao Li
Hongjun Liu
Leyi Zhao
Zisu Li
Jiawei Li
Jiajun Jiang
Linning Xu
Chen Zhao
Mingming Fan
Chen Liang
LLMAG
LRM
129
0
0
11 Oct 2025
Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
Yiran Shen
Yu Xia
Jonathan D. Chang
Prithviraj Ammanabrolu
160
0
0
01 Oct 2025
Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
Haizhong Zheng
Jiawei Zhao
Bedi Chen
OffRL
129
3
0
01 Oct 2025
Atomic Thinking of LLMs: Decoupling and Exploring Mathematical Reasoning Abilities
Jiayi Kuang
Haojing Huang
Yinghui Li
Xinnian Liang
Zhikun Xu
...
Xiaoyu Tan
Chao Qu
Meishan Zhang
Ying Shen
Philip S. Yu
LRM
153
5
0
30 Sep 2025
PiERN: Token-Level Routing for Integrating High-Precision Computation and Reasoning
Hengbo Xiao
Jingyuan Fan
Xin Tong
Jingzhao Zhang
Chao Lu
Guannan He
MoE
172
0
0
17 Sep 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Weiyun Wang
Zhangwei Gao
Lixin Gu
Hengjun Pu
Long Cui
...
Bowen Zhou
Kai Chen
Yu Qiao
Wenhai Wang
Gen Luo
MLLM
LRM
290
246
0
25 Aug 2025
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR
Xiao Liang
Zhongzhi Li
Yeyun Gong
Yelong Shen
Y. Wu
Zhijiang Guo
Weizhu Chen
LRM
211
24
0
19 Aug 2025
From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation
Chengliang Zhou
Mei Wang
Ting Zhang
Qiannan Zhu
Jian Li
Hua Huang
AI4Ed
ELM
215
1
0
05 Aug 2025
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
Shudong Liu
Hongwei Liu
Junnan Liu
Linchen Xiao
Songyang Gao
...
Yuzhe Gu
Wenwei Zhang
Yang Li
Songyang Zhang
Kai Chen
144
12
0
05 Aug 2025
Technical Report of TeleChat2, TeleChat2.5 and T1
Zihan Wang
Xinzhang Liu
Yitong Yao
Chao Wang
Yu Zhao
...
Bingkai Yang
Shuangyong Song
Yongxiang Li
Zhongjiang He
Xuelong Li
AI4TS
LRM
386
5
0
24 Jul 2025
Towards Greater Leverage: Scaling Laws for Efficient Mixture-of-Experts Language Models
Changxin Tian
Kunlong Chen
Jia-Ling Liu
Ziqi Liu
Zhiqiang Zhang
Jun Zhou
MoE
375
8
0
23 Jul 2025
WSM: Decay-Free Learning Rate Schedule via Checkpoint Merging for LLM Pre-training
Changxin Tian
Jiapeng Wang
Qian Zhao
Kunlong Chen
Jia-Ling Liu
Ziqi Liu
Jiaxin Mao
Wayne Xin Zhao
Zhiqiang Zhang
Jun Zhou
MoMe
CLL
244
6
0
23 Jul 2025
RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services
Fei Zhao
Chonggang Lu
Yue Wang
Zheyong Xie
Ziyan Liu
...
Jun Fan
Xiaolong Jiang
Weiting Liu
Boyang Wang
Shaosheng Cao
ALM
193
0
0
13 Jul 2025
MinosEval: Distinguishing Factoid and Non-Factoid for Tailored Open-Ended QA Evaluation with LLMs
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yongqi Fan
Yating Wang
Guandong Wang
Jie Zhai
Jingping Liu
Qi Ye
Tong Ruan
151
0
0
18 Jun 2025
Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic
Workshop on Innovative Use of NLP for Building Educational Applications (UNBEA), 2025
Zhenjiang Mao
Artem Bisliouk
Rohith Reddy Nama
Ivan Ruchkin
ReLM
LRM
166
3
0
09 Jun 2025
VisioMath: Benchmarking Figure-based Mathematical Reasoning in LMMs
Can Li
Ting Zhang
Ting Zhang
Mei Wang
Hua Huang
LRM
190
4
0
07 Jun 2025
STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent Framework
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Wenhao Liu
Zhenyi Lu
Xinyu Hu
Jierui Zhang
Dailin Li
...
Pei Zhang
Chengbo Zhang
Yuxiang Ren
Xiaohong Huang
Yan Ma
OffRL
290
3
0
02 Jun 2025
From Objectives to Questions: A Planning-based Framework for Educational Mathematical Question Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Cheng Cheng
Z. Huang
Guanhao Zhao
Yuxiang Guo
Xin Lin
J. Wu
Xin Li
Shijin Wang
220
1
0
01 Jun 2025
Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting
Yifan Wu
Jingze Shi
Yiran Peng
Jiayi Zhang
Xiaotian Lin
Nan Tang
Yuyu Luo
LRM
268
8
0
26 May 2025
Assessing the Capability of LLMs in Solving POSCOMP Questions
Cayo Viegas
Rohit Gheyi
Márcio Ribeiro
ELM
79
1
0
24 May 2025
T
2
^2
2
: An Adaptive Test-Time Scaling Strategy for Contextual Question Answering
Zhengyi Zhao
Shubo Zhang
Zezhong Wang
Huimin Wang
Yutian Zhao
Bin Liang
Yefeng Zheng
Binyang Li
Kam-Fai Wong
X. Wu
LRM
277
2
0
23 May 2025
TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning
Jinyang Wu
Chonghua Liao
Mingkuan Feng
Shuai Zhang
Zhengqi Wen
Pengpeng Shao
Huazhe Xu
Jianhua Tao
Jianhua Tao
OffRL
LRM
527
13
0
21 May 2025
Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Sunghwan Kim
Dongjin Kang
Taeyoon Kwon
Hyungjoo Chae
Dongha Lee
Jinyoung Yeo
ALM
328
2
0
19 May 2025
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models
Peichao Lai
Jianchao Tan
Yi Lin
Lingling Zhang
Feiyang Ye
...
Zifei Shan
Bin Wang
Longji Xu
Wentao Zhang
Bin Cui
ELM
LRM
459
0
0
12 May 2025
QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation
Mengze Hong
Wailing Ng
Chen Zhang
Chen Zhang
ELM
296
7
0
08 May 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLM
VLM
537
770
1
14 Apr 2025
Can the capability of Large Language Models be described by human ability? A Meta Study
Mingrui Zan
Yunquan Zhang
Boyang Zhang
Fangming Liu
Daning Cheng
ELM
LM&MA
254
1
0
13 Apr 2025
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
Shihong Deng
Dongbin Zhao
LRM
501
13
0
17 Mar 2025
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning
Bo Liu
Yunxiang Li
Yangqiu Song
Hanjing Wang
Linyi Yang
...
Jun Wang
Jun Wang
Weinan Zhang
Shuyue Hu
Ying Wen
LLMAG
KELM
LRM
AI4CE
491
35
0
12 Mar 2025
MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Xinwu Ye
Chengfan Li
Siming Chen
Xiangru Tang
Wei Wei
LRM
280
1
0
27 Feb 2025
Baichuan-M1: Pushing the Medical Capability of Large Language Models
Binghai Wang
Haizhou Zhao
Huozhi Zhou
Liang Song
Mingyu Xu
...
Yan Zhang
Yifei Duan
Yuyan Zhou
Zhi-Ming Ma
Zhikai Wu
LM&MA
ELM
AI4MH
362
31
0
18 Feb 2025
Improving Natural Language Understanding for LLMs via Large-Scale Instruction Synthesis
AAAI Conference on Artificial Intelligence (AAAI), 2025
Lin Yuan
Jun Xu
Honghao Gui
Mengshu Sun
Qing Cui
Lei Liang
Jun Zhou
AI4CE
814
2
0
06 Feb 2025
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
Xin Xu
Qiyun Xu
Tong Xiao
Tianhao Chen
Yuchen Yan
Jiaxin Zhang
Boyao Wang
Can Yang
Yang Wang
LRM
AI4CE
ELM
779
20
0
01 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
Qingbin Liu
Tao Zhang
Tao Zhang
Tian Jin
...
Jianhua Xu
Haoze Sun
Mingan Lin
Guosheng Dong
Xin Wu
AuLLM
324
63
0
28 Jan 2025
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Haotian Luo
Li Shen
Haiying He
Yun Wang
Shiwei Liu
Wei Li
Naiqiang Tan
Xiaochun Cao
Dacheng Tao
VLM
LRM
506
180
0
22 Jan 2025
Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models
Journal of Artificial Intelligence Research (JAIR), 2025
Kaleem Ullah Qasim
Jiashu Zhang
Tariq Alsahfi
Ateeq Ur Rehman Butt
LRM
ReLM
295
3
0
03 Jan 2025
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
498
179
1
15 Nov 2024
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts
Bo Yang
Qingping Yang
Runtao Liu
Runtao Liu
LRM
ReLM
ELM
AIMat
348
6
0
11 Nov 2024
Number Cookbook: Number Understanding of Language Models and How to Improve It
International Conference on Learning Representations (ICLR), 2024
Haotong Yang
Yi Hu
Shijia Kang
Zhouchen Lin
Muhan Zhang
LRM
484
29
0
06 Nov 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Liwen Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
514
14
0
24 Oct 2024
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
The Web Conference (WWW), 2024
Peiyi Zhang
Yazhou Zhang
Bo Wang
Lu Rong
Jing Qin
Jing Qin
AI4Ed
ELM
352
6
0
19 Sep 2024
See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses
Yulong Chen
Yang Liu
Jianhao Yan
X. Bai
Ming Zhong
Yinghao Yang
Ziyi Yang
Chenguang Zhu
Yue Zhang
ALM
ELM
202
17
0
16 Aug 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang
Zengzhi Wang
Shijie Xia
Xuefeng Li
Haoyang Zou
...
Yuxiang Zheng
Shaoting Zhang
Dahua Lin
Yu Qiao
Pengfei Liu
ELM
LRM
283
68
0
18 Jun 2024
1
2
Next