Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.05685
Cited By
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
9 June 2023
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
Yonghao Zhuang
Zi Lin
Zhuohan Li
Dacheng Li
Eric P. Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena"
50 / 2,880 papers shown
Title
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories
Xing Han Lù
Amirhossein Kazemnejad
Nicholas Meade
Arkil Patel
Dongchan Shin
Alejandra Zambrano
Karolina Stañczak
Peter Shaw
Christopher Pal
Siva Reddy
LLMAG
40
1
0
11 Apr 2025
Large Language Models as Span Annotators
Zdeněk Kasner
Vilém Zouhar
Patrícia Schmidtová
Ivan Kartáč
Kristýna Onderková
Ondřej Plátek
Dimitra Gkatzia
Saad Mahamood
Ondrej Dusek
Simone Balloccu
ALM
35
0
0
11 Apr 2025
Large Language Models Could Be Rote Learners
Yuyang Xu
Renjun Hu
Haochao Ying
Jian Wu
Xing Shi
Wei Lin
ELM
160
0
0
11 Apr 2025
Fast-Slow-Thinking: Complex Task Solving with Large Language Models
Yiliu Sun
Yanfang Zhang
Zicheng Zhao
Sheng Wan
Dacheng Tao
Chen Gong
LRM
35
0
0
11 Apr 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
52
2
0
11 Apr 2025
VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering
Qi Zhi Lim
C. Lee
K. Lim
Kalaiarasi Sonai Muthu Anbananthen
31
0
0
11 Apr 2025
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge
Riccardo Cantini
A. Orsino
Massimo Ruggiero
Domenico Talia
AAML
ELM
45
0
0
10 Apr 2025
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding
Shenxi Wu
Xiangyu Zhao
Yuhang Zang
Haodong Duan
Xiaoyi Dong
Pan Zhang
Y. Cao
Dahua Lin
Jiaqi Wang
OffRL
60
1
0
10 Apr 2025
Enhanced Question-Answering for Skill-based learning using Knowledge-based AI and Generative AI
Rahul K. Dass
Rochan H. Madhusudhana
Erin C. Deye
Shashank Verma
Timothy A. Bydlon
Grace Brazil
Ashok K. Goel
24
1
0
10 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Yong-Jin Liu
Qi Wang
Fuzheng Zhang
VLM
63
1
0
10 Apr 2025
NorEval: A Norwegian Language Understanding and Generation Evaluation Benchmark
Vladislav Mikhailov
Tita Ranveig Enstad
David Samuel
Hans Christian Farsethås
Andrey Kutuzov
Erik Velldal
Lilja Øvrelid
ELM
45
0
0
10 Apr 2025
A System for Comprehensive Assessment of RAG Frameworks
Mattia Rengo
Senad Beadini
Domenico Alfano
Roberto Abbruzzese
45
1
0
10 Apr 2025
AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery
Amirhossein Abaskohi
A. Ramesh
Shailesh Nanisetty
Chirag Goel
David Vazquez
Christopher Pal
Spandana Gella
Giuseppe Carenini
I. Laradji
39
0
0
10 Apr 2025
From Speech to Summary: A Comprehensive Survey of Speech Summarization
Fabian Retkowski
Maike Züfle
Andreas Sudmann
Dinah Pfau
Jan Niehues
Alexander Waibel
44
0
0
10 Apr 2025
Efficient Tuning of Large Language Models for Knowledge-Grounded Dialogue Generation
Bo Zhang
Hui Ma
Dailin Li
Jian Ding
Jian Wang
Bo Xu
Hongfei Lin
KELM
44
0
0
10 Apr 2025
TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models
Sher Badshah
Ali Emami
Hassan Sajjad
LLMAG
ELM
45
0
0
10 Apr 2025
VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding
Henghao Zhao
Ge-Peng Ji
Rui Yan
Huan Xiong
Zechao Li
24
0
0
10 Apr 2025
2D-Curri-DPO: Two-Dimensional Curriculum Learning for Direct Preference Optimization
Mengyang Li
Zhong Zhang
27
0
0
10 Apr 2025
Synthesizing High-Quality Programming Tasks with LLM-based Expert and Student Agents
Manh Hung Nguyen
Victor-Alexandru Pădurean
Alkis Gotovos
Sebastian Tschiatschek
Adish Singla
24
0
0
10 Apr 2025
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
Boyuan Zheng
Michael Y. Fatemi
Xiaolong Jin
Zhilin Wang
Apurva Gandhi
...
Yu Gu
Jayanth Srinivasa
Gaowen Liu
Graham Neubig
Yu Su
CLL
41
1
0
09 Apr 2025
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens
Jiacheng Liu
Taylor Blanton
Yanai Elazar
Sewon Min
YenSung Chen
...
Sophie Lebrecht
Yejin Choi
Hannaneh Hajishirzi
Ali Farhadi
Jesse Dodge
36
1
0
09 Apr 2025
Beyond Reproducibility: Advancing Zero-shot LLM Reranking Efficiency with Setwise Insertion
Jakub Podolak
Leon Peric
Mina Janicijevic
Roxana Petcu
23
0
0
09 Apr 2025
Toward Holistic Evaluation of Recommender Systems Powered by Generative Models
Yashar Deldjoo
Nikhil Mehta
M. Sathiamoorthy
Shuai Zhang
Pablo Castells
Julian McAuley
EGVM
ELM
69
1
0
09 Apr 2025
A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models
Zhouhang Xie
Junda Wu
Yiran Shen
Yu Xia
Xintong Li
...
Sachin Kumar
Bodhisattwa Prasad Majumder
Jingbo Shang
Prithviraj Ammanabrolu
Julian McAuley
39
0
0
09 Apr 2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
Jifang Wang
Xue Yang
Longyue Wang
Zhenran Xu
Yixuan Wang
Yaowei Wang
Weihua Luo
Kaifu Zhang
Baotian Hu
Min Zhang
EGVM
DiffM
72
0
0
09 Apr 2025
Don't Let It Hallucinate: Premise Verification via Retrieval-Augmented Logical Reasoning
Yuehan Qin
Shawn Li
Yi Nian
Xinyan Velocity Yu
Yue Zhao
Xuezhe Ma
HILM
LRM
37
0
0
08 Apr 2025
Knowledge-Instruct: Effective Continual Pre-training from Limited Data using Instructions
O. Ovadia
Meni Brief
Rachel Lemberg
Eitam Sheetrit
CLL
KELM
47
0
0
08 Apr 2025
Separator Injection Attack: Uncovering Dialogue Biases in Large Language Models Caused by Role Separators
Xitao Li
Haoran Wang
Jiang Wu
Ting Liu
AAML
26
0
0
08 Apr 2025
Can LLMs Simulate Personas with Reversed Performance? A Benchmark for Counterfactual Instruction Following
Sai Adith Senthil Kumar
Hao Yan
Saipavan Perepa
Murong Yue
Ziyu Yao
62
0
0
08 Apr 2025
Information-Theoretic Reward Decomposition for Generalizable RLHF
Liyuan Mao
Haoran Xu
Amy Zhang
Weinan Zhang
Chenjia Bai
33
0
0
08 Apr 2025
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference
Shuzhang Zhong
Yizhou Sun
Ling Liang
Runsheng Wang
R. Huang
Meng Li
MoE
61
1
0
08 Apr 2025
Single-Agent vs. Multi-Agent LLM Strategies for Automated Student Reflection Assessment
Gen Li
Li Chen
Cheng Tang
Valdemar Švábenský
Daisuke Deguchi
Takayoshi Yamashita
Atsushi Shimada
LLMAG
57
0
0
08 Apr 2025
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
AAML
29
0
0
07 Apr 2025
Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification
Anqi Zhang
Yulin Chen
Jane Pan
Chen Zhao
Aurojit Panda
Jinyang Li
He He
ReLM
LRM
44
3
0
07 Apr 2025
R2Vul: Learning to Reason about Software Vulnerabilities with Reinforcement Learning and Structured Reasoning Distillation
Martin Weyssow
Chengran Yang
Junkai Chen
Yikun Li
Huihui Huang
...
Han Wei Ang
Frank Liauw
Eng Lieh Ouh
Lwin Khin Shar
David Lo
LRM
33
0
0
07 Apr 2025
DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation
Xinglin Lyu
Wei Tang
Yongqian Li
X. Zhao
Ming Zhu
...
Yaojie Lu
Min Zhang
Daimeng Wei
Hao Yang
Min Zhang
76
0
0
07 Apr 2025
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models
Minki Kang
Jongwon Jeong
Jaewoong Cho
ALM
LRM
46
2
0
07 Apr 2025
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Anna Goldie
Azalia Mirhoseini
Hao Zhou
Irene Cai
Christopher D. Manning
SyDa
OffRL
ReLM
LRM
111
3
0
07 Apr 2025
SmolVLM: Redefining small and efficient multimodal models
Andres Marafioti
Orr Zohar
Miquel Farré
Merve Noyan
Elie Bakouch
...
Hugo Larcher
Mathieu Morlon
Lewis Tunstall
Leandro von Werra
Thomas Wolf
VLM
39
7
0
07 Apr 2025
CREA: A Collaborative Multi-Agent Framework for Creative Content Generation with Diffusion Models
Kavana Venkatesh
Connor Dunlop
Pinar Yanardag
DiffM
40
0
0
07 Apr 2025
EduPlanner: LLM-Based Multi-Agent Systems for Customized and Intelligent Instructional Design
Xiaotian Zhang
Chao Zhang
Jianwen Sun
Jun Xiao
Yi Yang
Yawei Luo
LLMAG
AI4Ed
53
0
0
07 Apr 2025
A Llama walks into the 'Bar': Efficient Supervised Fine-Tuning for Legal Reasoning in the Multi-state Bar Exam
Rean Fernandes
André Biedenkapp
Frank Hutter
Noor H. Awad
ALM
ELM
LRM
45
0
0
07 Apr 2025
CARE: Aligning Language Models for Regional Cultural Awareness
Geyang Guo
Tarek Naous
Hiromi Wakaki
Yukiko Nishimura
Yuki Mitsufuji
Alan Ritter
Wei-ping Xu
52
0
0
07 Apr 2025
Video-Bench: Human-Aligned Video Generation Benchmark
Hui Han
Siyuan Li
Jiaqi Chen
Yiwen Yuan
Yuling Wu
...
Y. Li
Jingyang Zhang
Chi Zhang
Li Li
Yongxin Ni
EGVM
VGen
73
0
0
07 Apr 2025
NoveltyBench: Evaluating Language Models for Humanlike Diversity
Yiming Zhang
Harshita Diddee
Susan Holm
Hanchen Liu
Xinyue Liu
Vinay Samuel
Barry Wang
Daphne Ippolito
31
1
0
07 Apr 2025
SECQUE: A Benchmark for Evaluating Real-World Financial Analysis Capabilities
Noga Ben Yoash
Meni Brief
O. Ovadia
Gil Shenderovitz
Moshik Mishaeli
Rachel Lemberg
Eitam Sheetrit
ELM
AIFin
28
0
0
06 Apr 2025
ArxivBench: Can LLMs Assist Researchers in Conducting Research?
Ning Li
Jingran Zhang
Justin Cui
29
0
0
06 Apr 2025
Advancing Egocentric Video Question Answering with Multimodal Large Language Models
Alkesh Patel
Vibhav Chitalia
Yinfei Yang
25
0
0
06 Apr 2025
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs
Wasi Uddin Ahmad
Aleksander Ficek
Mehrzad Samadi
Jocelyn Huang
Vahid Noroozi
Somshubra Majumdar
Boris Ginsburg
ALM
42
1
0
05 Apr 2025
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
Kazuki Yano
Takumi Ito
Jun Suzuki
LRM
47
1
0
05 Apr 2025
Previous
1
2
3
4
5
...
56
57
58
Next