ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2312.07398
  4. Cited By
LLMEval: A Preliminary Study on How to Evaluate Large Language Models

LLMEval: A Preliminary Study on How to Evaluate Large Language Models

12 December 2023
Yue Zhang
Ming Zhang
Haipeng Yuan
Shichun Liu
Yongyao Shi
Tao Gui
Qi Zhang
Xuanjing Huang
    ALM
    ELM
ArXivPDFHTML

Papers citing "LLMEval: A Preliminary Study on How to Evaluate Large Language Models"

6 / 6 papers shown
Title
Stability in Single-Peaked Strategic Resource Selection Games
Stability in Single-Peaked Strategic Resource Selection Games
Henri Zeiler
21
3
0
09 May 2025
am-ELO: A Stable Framework for Arena-based LLM Evaluation
am-ELO: A Stable Framework for Arena-based LLM Evaluation
Zirui Liu
Jiatong Li
Yan Zhuang
Q. Liu
Shuanghong Shen
Jie Ouyang
Mingyue Cheng
Shijin Wang
30
0
0
06 May 2025
What Makes an Evaluation Useful? Common Pitfalls and Best Practices
What Makes an Evaluation Useful? Common Pitfalls and Best Practices
Gil Gekker
Meirav Segal
Dan Lahav
Omer Nevo
ELM
38
0
0
30 Mar 2025
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models
Zihao Wei
Jingcheng Deng
Liang Pang
Hanxing Ding
Huawei Shen
Xueqi Cheng
KELM
81
4
0
20 Feb 2025
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
45
55
0
18 Jun 2024
Inherent Trade-Offs between Diversity and Stability in Multi-Task
  Benchmarks
Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks
Guanhua Zhang
Moritz Hardt
35
7
0
02 May 2024
1