ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.05685
  4. Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric P. Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
    ALM
    OSLM
    ELM
ArXivPDFHTML

Papers citing "Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"

50 / 2,880 papers shown
Title
Sparkles: Unlocking Chats Across Multiple Images for Multimodal
  Instruction-Following Models
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Yupan Huang
Zaiqiao Meng
Fangyu Liu
Yixuan Su
Nigel Collier
Yutong Lu
MLLM
41
22
0
31 Aug 2023
Peering Through Preferences: Unraveling Feedback Acquisition for
  Aligning Large Language Models
Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models
Hritik Bansal
John Dang
Aditya Grover
ALM
35
20
0
30 Aug 2023
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation
Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation
Dawei Gao
Haibin Wang
Yaliang Li
Xiuyu Sun
Yichen Qian
Bolin Ding
Jingren Zhou
AI4TS
54
239
0
29 Aug 2023
LongBench: A Bilingual, Multitask Benchmark for Long Context
  Understanding
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding
Yushi Bai
Xin Lv
Jiajie Zhang
Hong Lyu
Jiankai Tang
...
Aohan Zeng
Lei Hou
Yuxiao Dong
Jie Tang
Juanzi Li
LLMAG
RALM
31
496
0
28 Aug 2023
ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large
  Language Models
ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models
Baolin Zhang
Hai-Yong Xie
Pengfan Du
Junhao Chen
Pengfei Cao
Yubo Chen
Shengping Liu
Kang Liu
Jun Zhao
ELM
ALM
24
1
0
28 Aug 2023
DISC-MedLLM: Bridging General Large Language Models and Real-World
  Medical Consultation
DISC-MedLLM: Bridging General Large Language Models and Real-World Medical Consultation
Zhijie Bao
Wei Chen
Shengze Xiao
Kuang Ren
Jiaao Wu
Cheng Zhong
J. Peng
Xuanjing Huang
Zhongyu Wei
LM&MA
19
71
0
28 Aug 2023
Evaluating the Robustness to Instructions of Large Language Models
Yuansheng Ni
Sichao Jiang
Xinyu Wu
Hui Shen
Yuli Zhou
ALM
30
2
0
28 Aug 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
62
4
0
28 Aug 2023
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on
  Language, Multimodal, and Scientific GPT Models
Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models
Kaiyuan Gao
Su He
Zhenyu He
Jiacheng Lin
Qizhi Pei
Jie Shao
Wei Zhang
LM&MA
SyDa
32
4
0
27 Aug 2023
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for
  Scientific Research
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
Liangtai Sun
Yang Han
Zihan Zhao
Da Ma
Zhe-Wei Shen
Baocai Chen
Lu Chen
Kai Yu
ELM
45
70
0
25 Aug 2023
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language
  Models
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Wenqi Shao
Mengzhao Chen
Zhaoyang Zhang
Peng-Tao Xu
Lirui Zhao
Zhiqiang Li
Kaipeng Zhang
Peng Gao
Yu Qiao
Ping Luo
MQ
15
176
0
25 Aug 2023
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
Haibo Jin
Haoxuan Che
Yi-Mou Lin
Haoxing Chen
MedIm
34
57
0
24 Aug 2023
Aligning Language Models with Offline Learning from Human Feedback
Aligning Language Models with Offline Learning from Human Feedback
Jian Hu
Li Tao
J. Yang
Chandler Zhou
ALM
OffRL
27
7
0
23 Aug 2023
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data
  Selection for Instruction Tuning
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning
Ming Li
Yong Zhang
Zhitao Li
Jiuhai Chen
Lichang Chen
Ning Cheng
Jianzong Wang
Dinesh Manocha
Jing Xiao
38
170
0
23 Aug 2023
From Instructions to Intrinsic Human Values -- A Survey of Alignment
  Goals for Big Models
From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models
Jing Yao
Xiaoyuan Yi
Xiting Wang
Jindong Wang
Xing Xie
ALM
27
42
0
23 Aug 2023
Target-Grounded Graph-Aware Transformer for Aerial Vision-and-Dialog
  Navigation
Target-Grounded Graph-Aware Transformer for Aerial Vision-and-Dialog Navigation
Yi-Chiao Su
Dongyan An
Yuan Xu
Kehan Chen
Yan Huang
49
2
0
22 Aug 2023
Giraffe: Adventures in Expanding Context Lengths in LLMs
Giraffe: Adventures in Expanding Context Lengths in LLMs
Arka Pal
Deep Karkhanis
Manley Roberts
Samuel Dooley
Arvind Sundararajan
Siddartha Naidu
16
39
0
21 Aug 2023
Instruction Tuning for Large Language Models: A Survey
Instruction Tuning for Large Language Models: A Survey
Shengyu Zhang
Linfeng Dong
Xiaoya Li
Sen Zhang
Xiaofei Sun
...
Jiwei Li
Runyi Hu
Tianwei Zhang
Fei Wu
Guoyin Wang
LM&MA
24
538
0
21 Aug 2023
SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence
  Understanding
SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence Understanding
Tianyu Yu
Chengyue Jiang
Chao Lou
Shen Huang
Xiaobin Wang
...
Haitao Zheng
Ningyu Zhang
Pengjun Xie
Fei Huang
Yong-jia Jiang
LRM
57
17
0
21 Aug 2023
PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator
PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator
Chuyi Kong
Yaxin Fan
Xiang Wan
Feng Jiang
Benyou Wang
37
7
0
21 Aug 2023
ChatEDA: A Large Language Model Powered Autonomous Agent for EDA
ChatEDA: A Large Language Model Powered Autonomous Agent for EDA
Zhuolun He
Haoyuan Wu
Xinyun Zhang
Xufeng Yao
Su Zheng
Haisheng Zheng
Bei Yu
LLMAG
32
50
0
20 Aug 2023
UniDoc: A Universal Large Multimodal Model for Simultaneous Text
  Detection, Recognition, Spotting and Understanding
UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding
Hao Feng
Zijian Wang
Jingqun Tang
Jinghui Lu
Wen-gang Zhou
Houqiang Li
Can Huang
MLLM
VLM
42
46
0
19 Aug 2023
GameEval: Evaluating LLMs on Conversational Games
GameEval: Evaluating LLMs on Conversational Games
Dan Qiao
Chenfei Wu
Yaobo Liang
Juntao Li
Nan Duan
ELM
LLMAG
24
20
0
19 Aug 2023
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual
  Questions
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu
Y. Xu
Y. Li
W. Li
Zhengzhang Chen
Z. Tu
MLLM
VLM
30
122
0
19 Aug 2023
Red-Teaming Large Language Models using Chain of Utterances for
  Safety-Alignment
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
Rishabh Bhardwaj
Soujanya Poria
ELM
19
127
0
18 Aug 2023
End-to-End Beam Retrieval for Multi-Hop Question Answering
End-to-End Beam Retrieval for Multi-Hop Question Answering
Jiahao Zhang
H. Zhang
Dongmei Zhang
Yong Liu
Sheng Huang
RALM
28
23
0
17 Aug 2023
CMB: A Comprehensive Medical Benchmark in Chinese
CMB: A Comprehensive Medical Benchmark in Chinese
Xidong Wang
Guiming Hardy Chen
Dingjie Song
Zhiyi Zhang
Zhihong Chen
...
Feng Jiang
Jianquan Li
Xiang Wan
Benyou Wang
Haizhou Li
LM&MA
ELM
AI4MH
30
79
0
17 Aug 2023
Evaluating the Instruction-Following Robustness of Large Language Models
  to Prompt Injection
Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection
Zekun Li
Baolin Peng
Pengcheng He
Xifeng Yan
ELM
SILM
AAML
41
23
0
17 Aug 2023
MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain
  Conversation
MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation
Junru Lu
Siyu An
Mingbao Lin
Gabriele Pergola
Yulan He
Di Yin
Xing Sun
Yunsheng Wu
47
31
0
16 Aug 2023
From Commit Message Generation to History-Aware Commit Message
  Completion
From Commit Message Generation to History-Aware Commit Message Completion
Aleksandra V. Eliseeva
Yaroslav Sokolov
Egor Bogomolov
Yaroslav Golubev
Danny Dig
T. Bryksin
30
20
0
15 Aug 2023
LLM-Mini-CEX: Automatic Evaluation of Large Language Model for
  Diagnostic Conversation
LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation
Xiaoming Shi
J. Xu
Jinru Ding
Jiali Pang
Sichen Liu
...
Lu Lu
Haihong Yang
Mingtao Hu
Tong Ruan
Shaoting Zhang
LM&MA
ELM
26
12
0
15 Aug 2023
A Survey on Model Compression for Large Language Models
A Survey on Model Compression for Large Language Models
Xunyu Zhu
Jian Li
Yong Liu
Can Ma
Weiping Wang
36
193
0
15 Aug 2023
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan
Weize Chen
Yusheng Su
Jianxuan Yu
Wei Xue
Shan Zhang
Jie Fu
Zhiyuan Liu
ELM
LLMAG
ALM
29
446
0
14 Aug 2023
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of
  Large Language Models
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models
K. Lu
Hongyi Yuan
Zheng Yuan
Runji Lin
Junyang Lin
Chuanqi Tan
Chang Zhou
Jingren Zhou
ALM
LRM
32
65
0
14 Aug 2023
Self-Alignment with Instruction Backtranslation
Self-Alignment with Instruction Backtranslation
Xian Li
Ping Yu
Chunting Zhou
Timo Schick
Omer Levy
Luke Zettlemoyer
Jason Weston
M. Lewis
SyDa
29
123
0
11 Aug 2023
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents
Zhiwei Liu
Weiran Yao
Jianguo Zhang
Le Xue
Shelby Heinecke
...
Ran Xu
P. Mùi
Haiquan Wang
Caiming Xiong
Silvio Savarese
LLMAG
34
83
0
11 Aug 2023
A Preliminary Study of the Intrinsic Relationship between Complexity and
  Alignment
A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment
Ying Zhao
Yu Bowen
Binyuan Hui
Haiyang Yu
Fei Huang
Yongbin Li
N. Zhang
42
22
0
10 Aug 2023
LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking
LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking
Fahim Dalvi
Maram Hasanain
Sabri Boughorbel
Basel Mousi
Samir Abdaljalil
...
Hamdy Mubarak
Ahmed M. Ali
Majd Hawasly
Nadir Durrani
Firoj Alam
25
24
0
09 Aug 2023
CLEVA: Chinese Language Models EVAluation Platform
CLEVA: Chinese Language Models EVAluation Platform
Yanyang Li
Jianqiao Zhao
Duo Zheng
Zi-Yuan Hu
Zhi Chen
...
Yongfeng Huang
Shijia Huang
Dahua Lin
Michael R. Lyu
Liwei Wang
ALM
ELM
33
10
0
09 Aug 2023
Generative Benchmark Creation for Table Union Search
Generative Benchmark Creation for Table Union Search
Koyena Pal
Aamod Khatiwada
Roee Shraga
Renée J. Miller
43
0
0
07 Aug 2023
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Wenqi Shao
Yutao Hu
Peng Gao
Meng Lei
Kaipeng Zhang
...
Peng-Tao Xu
Siyuan Huang
Hongsheng Li
Yuning Qiao
Ping Luo
VLM
MLLM
32
2
0
07 Aug 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
45
607
0
04 Aug 2023
Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation
  from Text
Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation from Text
Nandana Mihindukulasooriya
Sanju Tiwari
Carlos F. Enguix
K. Lata
31
52
0
04 Aug 2023
ESRL: Efficient Sampling-based Reinforcement Learning for Sequence
  Generation
ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation
Chenglong Wang
Hang Zhou
Yimin Hu
Yi Huo
Bei Li
Tongran Liu
Tong Xiao
Jingbo Zhu
19
8
0
04 Aug 2023
A Survey of Spanish Clinical Language Models
A Survey of Spanish Clinical Language Models
Guillem García Subies
Á. Jiménez
Paloma Martínez
LM&MA
ELM
LRM
23
0
0
04 Aug 2023
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
ALM
47
83
0
03 Aug 2023
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on
  Class-level Code Generation
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation
Xueying Du
Mingwei Liu
Kaixin Wang
Hanlin Wang
Junwei Liu
Yixuan Chen
Jiayi Feng
Chaofeng Sha
Xin Peng
Yiling Lou
ELM
ALM
31
138
0
03 Aug 2023
Local Large Language Models for Complex Structured Medical Tasks
Local Large Language Models for Complex Structured Medical Tasks
V. Bumgardner
Aaron D. Mullen
Samuel E. Armstrong
Caylin D. Hickey
Jeffrey A. Talbert
34
5
0
03 Aug 2023
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like
  Models at All Scales
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Z. Yao
Reza Yazdani Aminabadi
Olatunji Ruwase
Samyam Rajbhandari
Xiaoxia Wu
...
Heyang Qin
Masahiro Tanaka
Shuai Che
Shuaiwen Leon Song
Yuxiong He
ALM
OffRL
23
68
0
02 Aug 2023
Evaluating Instruction-Tuned Large Language Models on Code Comprehension
  and Generation
Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation
Zhiqiang Yuan
Junwei Liu
Qiancheng Zi
Mingwei Liu
Xin Peng
Yiling Lou
ALM
ELM
LRM
17
73
0
02 Aug 2023
Previous
123...55565758
Next