Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.08808
Cited By
Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge
16 August 2024
Ravi Raju
Swayambhoo Jain
Bo Li
Jonathan Li
Urmish Thakker
ALM
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge"
6 / 6 papers shown
Title
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
106
61
0
25 Nov 2024
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Florian E. Dorner
Vivian Y. Nastl
Moritz Hardt
ELM
ALM
33
5
0
17 Oct 2024
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
Zonghai Yao
Aditya Parashar
Huixue Zhou
Won Seok Jang
Feiyun Ouyang
Zhichao Yang
Hong-ye Yu
ELM
37
2
0
17 Oct 2024
Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs
Wanying Wang
Zeyu Ma
Pengfei Liu
Mingang Chen
LLMAG
45
1
0
15 Oct 2024
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Nandan Thakur
Nils Reimers
Andreas Rucklé
Abhishek Srivastava
Iryna Gurevych
VLM
229
720
0
17 Apr 2021
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
196
791
0
13 Sep 2019
1