ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.04181
  4. Cited By
Benchmarking Foundation Models with Language-Model-as-an-Examiner

Benchmarking Foundation Models with Language-Model-as-an-Examiner

7 June 2023
Yushi Bai
Jiahao Ying
Yixin Cao
Xin Lv
Yuze He
Xiaozhi Wang
Jifan Yu
Kaisheng Zeng
Yijia Xiao
Haozhe Lyu
Jiayin Zhang
Juanzi Li
Lei Hou
    ALM
    ELM
ArXivPDFHTML

Papers citing "Benchmarking Foundation Models with Language-Model-as-an-Examiner"

28 / 28 papers shown
Title
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
LecEval: An Automated Metric for Multimodal Knowledge Acquisition in Multimedia Learning
Joy Lim Jia Yin
Daniel Zhang-Li
Jifan Yu
H. Li
Shangqing Tu
...
Zhiyuan Liu
Huiqin Liu
Lei Hou
Juanzi Li
Bin Xu
24
0
0
04 May 2025
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers
Dylan Bouchard
Mohit Singh Chauhan
HILM
70
0
0
27 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
1
0
26 Apr 2025
An Illusion of Progress? Assessing the Current State of Web Agents
An Illusion of Progress? Assessing the Current State of Web Agents
Tianci Xue
Weijian Qi
Tianneng Shi
Chan Hee Song
Boyu Gou
D. Song
Huan Sun
Yu Su
LLMAG
ELM
92
4
1
02 Apr 2025
BPO: Towards Balanced Preference Optimization between Knowledge Breadth and Depth in Alignment
Sizhe Wang
Yongqi Tong
Hengyuan Zhang
Dawei Li
Xin Zhang
Tianlong Chen
85
5
0
21 Feb 2025
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models
Rate, Explain and Cite (REC): Enhanced Explanation and Attribution in Automatic Evaluation by Large Language Models
Aliyah R. Hsu
James Zhu
Zhichao Wang
Bin Bi
Shubham Mehrotra
...
Sougata Chaudhuri
Regunathan Radhakrishnan
S. Asur
Claire Na Cheng
Bin Yu
ALM
LRM
67
0
0
20 Feb 2025
ARIES: Stimulating Self-Refinement of Large Language Models by Iterative Preference Optimization
ARIES: Stimulating Self-Refinement of Large Language Models by Iterative Preference Optimization
Yongcheng Zeng
Xinyu Cui
Xuanfa Jin
Guoqing Liu
Zexu Sun
...
Dong Li
Ning Yang
Jianye Hao
H. Zhang
J. Wang
LRM
LLMAG
82
1
0
08 Feb 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAG
ALM
52
96
0
03 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
113
65
0
25 Nov 2024
Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction
Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction
J. Vice
Naveed Akhtar
Richard I. Hartley
Ajmal Saeed Mian
Ajmal Mian
DiffM
82
0
0
21 Nov 2024
Benchmarking LLMs' Judgments with No Gold Standard
Benchmarking LLMs' Judgments with No Gold Standard
Shengwei Xu
Yuxuan Lu
Grant Schoenebeck
Yuqing Kong
34
1
0
11 Nov 2024
Constraint Back-translation Improves Complex Instruction Following of Large Language Models
Constraint Back-translation Improves Complex Instruction Following of Large Language Models
Y. Qi
Hao Peng
X. Wang
Bin Xu
Lei Hou
Juanzi Li
56
0
0
31 Oct 2024
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?
Han Bao
Yue Huang
Yanbo Wang
Jiayi Ye
Xiangqi Wang
Xiuying Chen
Mohamed Elhoseiny
X. Zhang
Mohamed Elhoseiny
Xiangliang Zhang
47
7
0
28 Oct 2024
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Florian E. Dorner
Vivian Y. Nastl
Moritz Hardt
ELM
ALM
35
5
0
17 Oct 2024
CREAM: Consistency Regularized Self-Rewarding Language Models
CREAM: Consistency Regularized Self-Rewarding Language Models
Z. Wang
Weilei He
Zhiyuan Liang
Xuchao Zhang
Chetan Bansal
Ying Wei
Weitong Zhang
Huaxiu Yao
ALM
96
7
0
16 Oct 2024
Language Imbalance Driven Rewarding for Multilingual Self-improving
Language Imbalance Driven Rewarding for Multilingual Self-improving
Wen Yang
Junhong Wu
Chen Wang
Chengqing Zong
J. Zhang
ALM
LRM
66
4
0
11 Oct 2024
Your Weak LLM is Secretly a Strong Teacher for Alignment
Your Weak LLM is Secretly a Strong Teacher for Alignment
Leitian Tao
Yixuan Li
86
5
0
13 Sep 2024
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates
Hui Wei
Shenghua He
Tian Xia
Andy H. Wong
Jingyang Lin
Mei Han
Mei Han
ALM
ELM
59
23
0
23 Aug 2024
Automated Review Generation Method Based on Large Language Models
Automated Review Generation Method Based on Large Language Models
Shican Wu
Xiao Ma
Dehui Luo
Lulu Li
Xiangcheng Shi
...
Ran Luo
Chunlei Pei
Zhijian Zhao
Zhi-Jian Zhao
Jinlong Gong
69
0
0
30 Jul 2024
Meta-Rewarding Language Models: Self-Improving Alignment with
  LLM-as-a-Meta-Judge
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Tianhao Wu
Weizhe Yuan
O. Yu. Golovneva
Jing Xu
Yuandong Tian
Jiantao Jiao
Jason Weston
Sainbayar Sukhbaatar
ALM
KELM
LRM
44
72
0
28 Jul 2024
Benchmark Data Contamination of Large Language Models: A Survey
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
38
38
0
06 Jun 2024
LLM Evaluators Recognize and Favor Their Own Generations
LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery
Samuel R. Bowman
Shi Feng
36
156
0
15 Apr 2024
LLM-based NLG Evaluation: Current Status and Challenges
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
53
29
0
02 Feb 2024
Self-Rewarding Language Models
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
235
298
0
18 Jan 2024
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
254
2,232
0
22 Mar 2023
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
245
1,071
0
05 Oct 2022
What's in a Name? Answer Equivalence For Open-Domain Question Answering
What's in a Name? Answer Equivalence For Open-Domain Question Answering
Chenglei Si
Chen Zhao
Jordan L. Boyd-Graber
151
35
0
11 Sep 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
226
4,453
0
23 Jan 2020
1