Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.06003
Cited By
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models
9 April 2024
Zhuohao Yu
Chang Gao
Wenjin Yao
Yidong Wang
Zhengran Zeng
Wei Ye
Jindong Wang
Yue Zhang
Shikun Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models"
9 / 9 papers shown
Title
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
Qiujie Xie
Qingqiu Li
Zhuohao Yu
Yuejie Zhang
Yue Zhang
Linyi Yang
ELM
58
1
0
15 Feb 2025
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
Cunxiang Wang
Ruoxi Ning
Boqi Pan
Tonghui Wu
Qipeng Guo
...
Guangsheng Bao
Xiangkun Hu
Zheng Zhang
Qian Wang
Yue Zhang
RALM
63
3
0
18 Mar 2024
Don't Make Your LLM an Evaluation Benchmark Cheater
Kun Zhou
Yutao Zhu
Zhipeng Chen
Wentong Chen
Wayne Xin Zhao
Xu Chen
Yankai Lin
Ji-Rong Wen
Jiawei Han
ELM
105
136
0
03 Nov 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
206
559
0
03 May 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
203
2,232
0
22 Mar 2023
Leveraging Large Language Models for Multiple Choice Question Answering
Joshua Robinson
Christopher Rytting
David Wingate
ELM
126
181
0
22 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation
Marzena Karpinska
Nader Akoury
Mohit Iyyer
204
106
0
14 Sep 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1