Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.15641
Cited By
PRE: A Peer Review Based Large Language Model Evaluator
28 January 2024
Zhumin Chu
Qingyao Ai
Yiteng Tu
Haitao Li
Yiqun Liu
LRM
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PRE: A Peer Review Based Large Language Model Evaluator"
17 / 17 papers shown
Title
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
Qihui Zhang
Munan Ning
Zheyuan Liu
Yanbo Wang
Jiayi Ye
Yue Huang
Shuo Yang
Xiao Chen
Y. Song
Li Yuan
LRM
56
0
0
19 Mar 2025
DAFE: LLM-Based Evaluation Through Dynamic Arbitration for Free-Form Question-Answering
Sher Badshah
Hassan Sajjad
60
1
0
11 Mar 2025
LexRAG: Benchmarking Retrieval-Augmented Generation in Multi-Turn Legal Consultation Conversation
Haitao Li
Y. Chen
Yiran Hu
Qingyao Ai
Junjie Chen
Xiaoyu Yang
J. Yang
Yueyue Wu
Zeyang Liu
Y. Liu
AILaw
RALM
ELM
59
0
0
28 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
66
8
0
24 Feb 2025
LegalAgentBench: Evaluating LLM Agents in Legal Domain
H. Li
Junjie Chen
Jingli Yang
Qingyao Ai
Wei Jia
...
Guozhi Yuan
Yiran Hu
Wuyue Wang
Y. Liu
Minlie Huang
LLMAG
AILaw
ELM
48
11
0
23 Dec 2024
AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs' Complex Reasoning Capabilities
Fabrizio Davide
Pietro Torre
Andrea Gaggioli
Andrea Gaggioli
ELM
90
0
0
12 Dec 2024
CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges
Haitao Li
Junjie Chen
Qingyao Ai
Zhumin Chu
Yujia Zhou
Qian Dong
Yiqun Liu
32
8
0
20 Oct 2024
An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation
Junjie Chen
Weihang Su
Zhumin Chu
Haitao Li
Qinyao Ai
Yiqun Liu
Min Zhang
Shaoping Ma
22
3
0
16 Oct 2024
LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models
Haitao Li
You Chen
Qingyao Ai
Yueyue Wu
Ruizhe Zhang
Yiqun Liu
ALM
AILaw
ELM
44
8
0
30 Sep 2024
Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks
Justin Zhao
Flor Miriam Plaza del Arco
A. C. Curry
Amanda Cercas Curry
ELM
ALM
30
1
0
12 Jun 2024
Auto Arena of LLMs: Automating LLM Evaluations with Agent Peer-battles and Committee Discussions
Ruochen Zhao
Wenxuan Zhang
Yew Ken Chia
Deli Zhao
Lidong Bing
30
9
0
30 May 2024
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models
Haitao Li
Qingyao Ai
Jia Chen
Qian Dong
Zhijing Wu
Yiqun Liu
Chong Chen
Qi Tian
AILaw
48
13
0
27 Mar 2024
Large Language Models for Data Annotation: A Survey
Zhen Tan
Dawei Li
Song Wang
Alimohammad Beigi
Bohan Jiang
Amrita Bhattacharjee
Mansooreh Karami
Jundong Li
Lu Cheng
Huan Liu
SyDa
42
44
0
21 Feb 2024
TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs
Shuyi Xie
Wenlin Yao
Yong Dai
Shaobo Wang
Donlin Zhou
...
Zhichao Hu
Dong Yu
Zhengyou Zhang
Jing Nie
Yuhong Liu
ELM
ALM
11
4
0
09 Nov 2023
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
240
1,070
0
05 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1