ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.05685
  4. Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric P. Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
    ALM
    OSLM
    ELM
ArXivPDFHTML

Papers citing "Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"

50 / 2,880 papers shown
Title
Bielik 11B v2 Technical Report
Bielik 11B v2 Technical Report
Krzysztof Ociepa
Łukasz Flis
Krzysztof Wróbel
Adrian Gwoździej
Remigiusz Kinas
34
0
0
05 May 2025
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
72
1
0
05 May 2025
Adaptive Thinking via Mode Policy Optimization for Social Language Agents
Adaptive Thinking via Mode Policy Optimization for Social Language Agents
Minzheng Wang
Y. Li
Haozhao Wang
Xinghua Zhang
Nan Xu
Bingli Wu
Fei Huang
Haiyang Yu
Wenji Mao
LLMAG
LRM
43
1
0
04 May 2025
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation
Meng-Hao Guo
Jiajun Xu
Yi Zhang
Jiaxi Song
Haoyang Peng
...
Yongming Rao
Houwen Peng
Han Hu
Gordon Wetzstein
Shi-Min Hu
ELM
LRM
60
2
0
04 May 2025
Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language Models
Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language Models
Tobias Domhan
Dawei Zhu
30
0
0
03 May 2025
LookAlike: Consistent Distractor Generation in Math MCQs
LookAlike: Consistent Distractor Generation in Math MCQs
Nisarg Parikh
Nigel Fernandez
Alexander Scarlatos
Simon Woodhead
Andrew S. Lan
53
0
0
03 May 2025
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Haoming Yang
Ke Ma
Xiaojun Jia
Yingfei Sun
Qianqian Xu
Q. Huang
AAML
159
0
0
03 May 2025
Multi-agents based User Values Mining for Recommendation
Multi-agents based User Values Mining for Recommendation
L. Chen
Wei Yuan
Tong Chen
Xiangyu Zhao
Nguyen Quoc Viet Hung
Hongzhi Yin
OffRL
49
0
0
02 May 2025
Harnessing Structured Knowledge: A Concept Map-Based Approach for High-Quality Multiple Choice Question Generation with Effective Distractors
Harnessing Structured Knowledge: A Concept Map-Based Approach for High-Quality Multiple Choice Question Generation with Effective Distractors
Nicy Scaria
Silvester John Joseph Kennedy
Diksha Seth
Ananya Thakur
Deepak N. Subramani
AI4Ed
23
0
0
02 May 2025
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Liaoyaqi Wang
Zhengping Jiang
Anqi Liu
Benjamin Van Durme
61
0
0
02 May 2025
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
Zongxia Li
Xiyang Wu
Guangyao Shi
Yubin Qin
Hongyang Du
Tianyi Zhou
Dinesh Manocha
Jordan Lee Boyd-Graber
MLLM
57
0
0
02 May 2025
Parameter-Efficient Fine-Tuning with Circulant and Diagonal Vectors
Parameter-Efficient Fine-Tuning with Circulant and Diagonal Vectors
Xinyu Ding
Lexuan Chen
Siyu Liao
Zhongfeng Wang
49
0
0
01 May 2025
Block Circulant Adapter for Large Language Models
Block Circulant Adapter for Large Language Models
Xinyu Ding
Meiqi Wang
Siyu Liao
Zhongfeng Wang
38
0
0
01 May 2025
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation
Chaitali Bhattacharyya
Yeseong Kim
45
0
0
01 May 2025
Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic Pipelines
R. Manuvinakurike
Emanuel Moss
E. A. Watkins
Saurav Sahay
G. Raffa
L. Nachman
LRM
31
0
0
01 May 2025
DeepCritic: Deliberate Critique with Large Language Models
DeepCritic: Deliberate Critique with Large Language Models
Wenkai Yang
Jingwen Chen
Yankai Lin
Ji-Rong Wen
ALM
LRM
30
0
0
01 May 2025
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
Bang Zhang
Ruotian Ma
Qingxuan Jiang
Peisong Wang
Jiaqi Chen
...
Fanghua Ye
Jian Li
Yifan Yang
Zhaopeng Tu
Xiaolong Li
LLMAG
ELM
ALM
109
0
1
01 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
A. Yuille
Jieneng Chen
LRM
65
1
0
01 May 2025
MINERVA: Evaluating Complex Video Reasoning
MINERVA: Evaluating Complex Video Reasoning
Arsha Nagrani
Sachit Menon
Ahmet Iscen
Shyamal Buch
Ramin Mehran
...
Yukun Zhu
Carl Vondrick
Mikhail Sirotenko
Cordelia Schmid
Tobias Weyand
58
0
0
01 May 2025
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Shaokun Zhang
Ming Yin
Jieyu Zhang
Jing Liu
Zhiguang Han
...
Beibin Li
Chi Wang
H. Wang
Y. Chen
Qingyun Wu
49
1
0
30 Apr 2025
An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding
An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding
Xiuwei Shang
Zhenkan Fu
Shaoyin Cheng
Guoqiang Chen
Gangyang Li
Li Hu
Wenbo Zhang
N. Yu
62
0
0
30 Apr 2025
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Hanhua Hong
Chenghao Xiao
Yang Wang
Y. Liu
Wenge Rong
Chenghua Lin
31
0
0
29 Apr 2025
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
Mihai Nadas
Laura Diosan
Andrei Piscoran
Andreea Tomescu
VGen
57
0
0
29 Apr 2025
CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks
CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks
Rui Wang
Junda Wu
Yu Xia
Tong Yu
R. Zhang
Ryan Rossi
Lina Yao
Julian McAuley
AAML
SILM
51
0
0
29 Apr 2025
Computational Reasoning of Large Language Models
Computational Reasoning of Large Language Models
Haitao Wu
Zongbo Han
Huaxi Huang
Huaxi Huang
Changqing Zhang
ELM
LRM
62
0
0
29 Apr 2025
Learning Streaming Video Representation via Multitask Training
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
84
0
0
28 Apr 2025
AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers
AutoP2C: An LLM-Based Agent Framework for Code Repository Generation from Multimodal Content in Academic Papers
Zijie Lin
Yiqing Shen
Qilin Cai
He Sun
Jinrui Zhou
Mingjun Xiao
57
0
0
28 Apr 2025
Chatbot Arena Meets Nuggets: Towards Explanations and Diagnostics in the Evaluation of LLM Responses
Chatbot Arena Meets Nuggets: Towards Explanations and Diagnostics in the Evaluation of LLM Responses
Sahel Sharifymoghaddam
Shivani Upadhyay
Nandan Thakur
Ronak Pradeep
Jimmy Lin
RALM
27
0
0
28 Apr 2025
$\texttt{SAGE}$: A Generic Framework for LLM Safety Evaluation
SAGE\texttt{SAGE}SAGE: A Generic Framework for LLM Safety Evaluation
Madhur Jindal
Hari Shrawgi
Parag Agrawal
Sandipan Dandapat
ELM
47
0
0
28 Apr 2025
Platonic Grounding for Efficient Multimodal Language Models
Platonic Grounding for Efficient Multimodal Language Models
Moulik Choraria
Xinbo Wu
Akhil Bhimaraju
Nitesh Sekhar
Yue Wu
Xu Zhang
Prateek Singhal
L. Varshney
59
0
0
27 Apr 2025
Anyprefer: An Agentic Framework for Preference Data Synthesis
Anyprefer: An Agentic Framework for Preference Data Synthesis
Yiyang Zhou
Zekun Wang
Tianle Wang
Shangyu Xing
Peng Xia
...
Chetan Bansal
Weitong Zhang
Ying Wei
Joey Tianyi Zhou
Huaxiu Yao
63
1
0
27 Apr 2025
Explanatory Summarization with Discourse-Driven Planning
Explanatory Summarization with Discourse-Driven Planning
Dongqi Liu
Xi Yu
Vera Demberg
Mirella Lapata
50
0
0
27 Apr 2025
When2Call: When (not) to Call Tools
When2Call: When (not) to Call Tools
Hayley Ross
Ameya Sunil Mahabaleshwarkar
Yoshi Suhara
93
0
0
26 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
86
2
0
26 Apr 2025
MATCHA: Can Multi-Agent Collaboration Build a Trustworthy Conversational Recommender?
MATCHA: Can Multi-Agent Collaboration Build a Trustworthy Conversational Recommender?
Zheng Hui
Xiaokai Wei
Yexi Jiang
Kevin Gao
Chen Wang
Frank Ong
Se-eun Yoon
Rachit Pareek
Michelle Gong
LLMAG
63
0
0
26 Apr 2025
Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning
Towards Robust Dialogue Breakdown Detection: Addressing Disruptors in Large Language Models with Self-Guided Reasoning
Abdellah Ghassel
Xianzhi Li
Xiaodan Zhu
51
0
0
26 Apr 2025
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Xiaozhong Liu
Hangyu Guo
Ranjie Duan
Xingyuan Bu
Yancheng He
...
Yingshui Tan
Yanan Wu
Jihao Gu
Heng Chang
Jun Zhu
MLLM
154
0
0
25 Apr 2025
An Empirical Study of Evaluating Long-form Question Answering
An Empirical Study of Evaluating Long-form Question Answering
Ning Xian
Yixing Fan
Ruqing Zhang
Maarten de Rijke
Jiafeng Guo
ELM
32
0
0
25 Apr 2025
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
Adversarial Attacks on LLM-as-a-Judge Systems: Insights from Prompt Injections
Narek Maloyan
Dmitry Namiot
SILM
AAML
ELM
83
0
0
25 Apr 2025
A Model Zoo on Phase Transitions in Neural Networks
A Model Zoo on Phase Transitions in Neural Networks
Konstantin Schurholt
Léo Meynent
Yefan Zhou
Haiquan Lu
Yaoqing Yang
Damian Borth
68
0
0
25 Apr 2025
CoheMark: A Novel Sentence-Level Watermark for Enhanced Text Quality
CoheMark: A Novel Sentence-Level Watermark for Enhanced Text Quality
Junyan Zhang
Shuliang Liu
Aiwei Liu
Yubo Gao
Jiajun Li
Xiaojie Gu
Xuming Hu
WaLM
60
2
0
24 Apr 2025
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Minju Seo
Jinheon Baek
Seongyun Lee
Sung Ju Hwang
AI4CE
44
0
0
24 Apr 2025
A RAG-Based Multi-Agent LLM System for Natural Hazard Resilience and Adaptation
A RAG-Based Multi-Agent LLM System for Natural Hazard Resilience and Adaptation
Yangxinyu Xie
Bowen Jiang
Tanwi Mallick
Joshua Bergerson
John K Hutchison
...
Robert B. Ross
Yan Feng
L. Levy
Weijie J. Su
Camillo J Taylor
32
1
0
24 Apr 2025
Planning with Diffusion Models for Target-Oriented Dialogue Systems
Planning with Diffusion Models for Target-Oriented Dialogue Systems
Hanwen Du
B. Peng
Xia Ning
25
0
0
23 Apr 2025
Leveraging LLMs as Meta-Judges: A Multi-Agent Framework for Evaluating LLM Judgments
Leveraging LLMs as Meta-Judges: A Multi-Agent Framework for Evaluating LLM Judgments
Y. Li
Jama Hussein Mohamud
Chongren Sun
Di Wu
Benoit Boulet
LLMAG
ELM
72
0
0
23 Apr 2025
Process Reward Models That Think
Process Reward Models That Think
Muhammad Khalifa
Rishabh Agarwal
Lajanugen Logeswaran
Jaekyeom Kim
Hao Peng
Moontae Lee
Honglak Lee
Lu Wang
OffRL
ALM
LRM
44
1
0
23 Apr 2025
How Individual Traits and Language Styles Shape Preferences In Open-ended User-LLM Interaction: A Preliminary Study
How Individual Traits and Language Styles Shape Preferences In Open-ended User-LLM Interaction: A Preliminary Study
Rendi Chevi
Kentaro Inui
Thamar Solorio
Alham Fikri Aji
124
0
0
23 Apr 2025
V$^2$R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations
V2^22R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations
Zhiyuan Fan
Yumeng Wang
Sandeep Polisetty
Yi Ren Fung
50
0
0
23 Apr 2025
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction
Yuxin Jiang
Y. Wang
Chuhan Wu
Xinyi Dai
Yan Xu
...
Y. Wang
Xin Jiang
Lifeng Shang
R. Tang
Luu Anh Tuan
36
0
0
22 Apr 2025
FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation
FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation
Chanyeol Choi
Jihoon Kwon
Jaeseon Ha
Hojun Choi
Chaewoon Kim
Yongjae Lee
Jy-yong Sohn
Alejandro Lopez-Lira
RALM
58
0
0
22 Apr 2025
Previous
12345...565758
Next