ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.03109
  4. Cited By
A Survey on Evaluation of Large Language Models

A Survey on Evaluation of Large Language Models

6 July 2023
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
Kaijie Zhu
Hao Chen
Xiaoyuan Yi
Cunxiang Wang
Yidong Wang
Weirong Ye
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
    ELM
    LM&MA
    ALM
ArXivPDFHTML

Papers citing "A Survey on Evaluation of Large Language Models"

50 / 126 papers shown
Title
Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales
Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales
Maor Reuben
Ortal Slobodin
Aviad Elyshar
Idan-Chaim Cohen
Orna Braun-Lewensohn
Odeya Cohen
Rami Puzis
35
0
0
29 Sep 2024
Recent Advances in OOD Detection: Problems and Approaches
Recent Advances in OOD Detection: Problems and Approaches
Shuo Lu
YingSheng Wang
Lijun Sheng
Aihua Zheng
Lingxiao He
Jian Liang
OODD
42
2
0
18 Sep 2024
Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models
Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models
Haoran Ye
Yuhang Xie
Yuanyi Ren
Hanjun Fang
Xin Zhang
Guojie Song
LM&MA
27
1
0
18 Sep 2024
Political DEBATE: Efficient Zero-shot and Few-shot Classifiers for
  Political Text
Political DEBATE: Efficient Zero-shot and Few-shot Classifiers for Political Text
Michael Burnham
Kayla Kahn
Ryan Yank Wang
Rachel X. Peng
27
4
0
03 Sep 2024
Exploring the Feasibility of Automated Data Standardization using Large
  Language Models for Seamless Positioning
Exploring the Feasibility of Automated Data Standardization using Large Language Models for Seamless Positioning
M. Lee
Ju Lin
Li-Ta Hsu
13
0
0
22 Aug 2024
TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of
  Audio-Guided LLM-Based Robot Navigation
TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of Audio-Guided LLM-Based Robot Navigation
Xingpeng Sun
Yiran Zhang
Xindi Tang
Amrit Singh Bedi
Aniket Bera
37
4
0
03 Aug 2024
Speech-Guided Sequential Planning for Autonomous Navigation using Large
  Language Model Meta AI 3 (Llama3)
Speech-Guided Sequential Planning for Autonomous Navigation using Large Language Model Meta AI 3 (Llama3)
Alkesh K. Srivastava
Philip Dames
LLMAG
LM&Ro
35
1
0
13 Jul 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
42
5
0
11 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
31
37
0
09 Jul 2024
On Speeding Up Language Model Evaluation
On Speeding Up Language Model Evaluation
Jin Peng Zhou
Christian K. Belardi
Ruihan Wu
Travis Zhang
Carla P. Gomes
Wen Sun
Kilian Q. Weinberger
36
1
0
08 Jul 2024
Leveraging Large Language Models for Integrated
  Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions
Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions
Shumaila Javaid
R. A. Khalil
Nasir Saeed
Bin He
Mohamed-Slim Alouini
29
8
0
05 Jul 2024
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhimin Zhao
A. A. Bangash
F. Côgo
Bram Adams
Ahmed E. Hassan
49
0
0
04 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
54
18
0
02 Jul 2024
LLM2FEA: Discover Novel Designs with Generative Evolutionary
  Multitasking
LLM2FEA: Discover Novel Designs with Generative Evolutionary Multitasking
Melvin Wong
Jiao Liu
Thiago Rios
Stefan Menzel
Yew-Soon Ong
34
2
0
21 Jun 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
45
55
0
18 Jun 2024
$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with
  Sparse Mixture-of-Experts
MoE-RBench\texttt{MoE-RBench}MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen
Xinyu Zhao
Tianlong Chen
Yu Cheng
MoE
54
5
0
17 Jun 2024
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
Shengkang Wang
Hongzhan Lin
Ziyang Luo
Zhen Ye
Guang Chen
Jing Ma
51
3
0
17 Jun 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
68
22
0
17 Jun 2024
Ontology Embedding: A Survey of Methods, Applications and Resources
Ontology Embedding: A Survey of Methods, Applications and Resources
Jiaoyan Chen
Olga Mashkova
Fernando Zhapa-Camacho
R. Hoehndorf
Yuan He
Ian Horrocks
27
4
0
16 Jun 2024
A Survey on Large Language Models from General Purpose to Medical
  Applications: Datasets, Methodologies, and Evaluations
A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations
Jinqiang Wang
Huansheng Ning
Yi Peng
Qikai Wei
Daniel Tesfai
Wenwei Mao
Tao Zhu
Runhe Huang
LM&MA
AI4MH
ELM
33
4
0
14 Jun 2024
Adversarial Evasion Attack Efficiency against Large Language Models
Adversarial Evasion Attack Efficiency against Large Language Models
João Vitorino
Eva Maia
Isabel Praça
AAML
24
2
0
12 Jun 2024
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song
Muxi Diao
Guanting Dong
Zhengyang Wang
Yujia Fu
...
Yejie Wang
Zhuoma Gongque
Jianing Yu
Qiuna Tan
Weiran Xu
ELM
39
10
0
12 Jun 2024
Cycles of Thought: Measuring LLM Confidence through Stable Explanations
Cycles of Thought: Measuring LLM Confidence through Stable Explanations
Evan Becker
Stefano Soatto
24
6
0
05 Jun 2024
Large Language Models as Evaluators for Recommendation Explanations
Large Language Models as Evaluators for Recommendation Explanations
Xiaoyu Zhang
Yishan Li
Jiayin Wang
Bowen Sun
Weizhi Ma
Peijie Sun
Min Zhang
LRM
ELM
27
12
0
05 Jun 2024
A Misleading Gallery of Fluid Motion by Generative Artificial
  Intelligence
A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence
Ali Kashefi
VGen
38
5
0
24 May 2024
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
Cong Lu
Shengran Hu
Jeff Clune
LLMAG
29
9
0
24 May 2024
Evaluation of Retrieval-Augmented Generation: A Survey
Evaluation of Retrieval-Augmented Generation: A Survey
Hao Yu
Aoran Gan
Kai Zhang
Shiwei Tong
Qi Liu
Zhaofeng Liu
3DV
52
78
0
13 May 2024
Traffic Performance GPT (TP-GPT): Real-Time Data Informed Intelligent
  ChatBot for Transportation Surveillance and Management
Traffic Performance GPT (TP-GPT): Real-Time Data Informed Intelligent ChatBot for Transportation Surveillance and Management
Bingzhang Wang
Zhiyu Cai
Muhammad Monjurul Karim
Chenxi Liu
Yinhai Wang
23
6
0
05 May 2024
NegativePrompt: Leveraging Psychology for Large Language Models
  Enhancement via Negative Emotional Stimuli
NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli
Xu Wang
Cheng-rong Li
Yi-Ju Chang
Jindong Wang
Yuan Wu
25
7
0
05 May 2024
From Matching to Generation: A Survey on Generative Information Retrieval
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
54
36
0
23 Apr 2024
Can LLMs Understand Computer Networks? Towards a Virtual System
  Administrator
Can LLMs Understand Computer Networks? Towards a Virtual System Administrator
Denis Donadel
Francesco Marchiori
Luca Pajola
Mauro Conti
21
7
0
19 Apr 2024
Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions
Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions
Taojun Hu
Xiao-Hua Zhou
ELM
16
12
0
14 Apr 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path
  Forward
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
35
6
0
12 Apr 2024
Automating Research Synthesis with Domain-Specific Large Language Model
  Fine-Tuning
Automating Research Synthesis with Domain-Specific Large Language Model Fine-Tuning
Teo Susnjak
Peter Hwang
N. Reyes
A. Barczak
Timothy R. McIntosh
Surangika Ranathunga
41
22
0
08 Apr 2024
Multicalibration for Confidence Scoring in LLMs
Multicalibration for Confidence Scoring in LLMs
Gianluca Detommaso
Martín Bertrán
Riccardo Fogliato
Aaron Roth
19
12
0
06 Apr 2024
A Survey on Large Language Model-Based Game Agents
A Survey on Large Language Model-Based Game Agents
Sihao Hu
Tiansheng Huang
Gaowen Liu
Ramana Rao Kompella
Gaowen Liu
Selim Furkan Tekin
Yichang Xu
Zachary Yahn
Ling Liu
LLMAG
LM&Ro
AI4CE
LM&MA
60
49
0
02 Apr 2024
Evaluating the Factuality of Large Language Models using Large-Scale
  Knowledge Graphs
Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs
Xiaoze Liu
Feijie Wu
Tianyang Xu
Zhuo Chen
Yichi Zhang
Xiaoqian Wang
Jing Gao
HILM
33
8
0
01 Apr 2024
Content Knowledge Identification with Multi-Agent Large Language Models
  (LLMs)
Content Knowledge Identification with Multi-Agent Large Language Models (LLMs)
Kaiqi Yang
Yucheng Chu
Taylor Darwin
Ahreum Han
Hang Li
Hongzhi Wen
Yasemin Copur-Gencturk
Jiliang Tang
Hui Liu
16
12
0
22 Mar 2024
Large Language Models for Blockchain Security: A Systematic Literature Review
Large Language Models for Blockchain Security: A Systematic Literature Review
Zheyuan He
Zihao Li
Sen Yang
Ao Qiao
Xiaosong Zhang
Xiapu Luo
Ting Chen
Ting Chen
PILM
42
14
0
21 Mar 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
39
7
0
29 Feb 2024
COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling
COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling
Baihan Lin
Djallel Bouneffouf
Yulia Landa
Rachel Jespersen
Cheryl Corcoran
Guillermo Cecchi
34
1
0
22 Feb 2024
API Pack: A Massive Multi-Programming Language Dataset for API Call Generation
API Pack: A Massive Multi-Programming Language Dataset for API Call Generation
Zhen Guo
Adriana Meza Soria
Wei Sun
Yikang Shen
Rameswar Panda
ELM
ALM
34
1
0
14 Feb 2024
Utilizing Large LanguageModels to Detect Privacy Leaks in Mini-App Code
Utilizing Large LanguageModels to Detect Privacy Leaks in Mini-App Code
Liming Jiang
6
1
0
12 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
107
347
0
09 Feb 2024
Enhancing Zero-shot Counting via Language-guided Exemplar Learning
Enhancing Zero-shot Counting via Language-guided Exemplar Learning
Mingjie Wang
Jun Zhou
Yong Dai
Eric Buys
Minglun Gong
23
0
0
08 Feb 2024
Adversarial Text Purification: A Large Language Model Approach for
  Defense
Adversarial Text Purification: A Large Language Model Approach for Defense
Raha Moraffah
Shubh Khandelwal
Amrita Bhattacharjee
Huan Liu
DeLMO
AAML
15
5
0
05 Feb 2024
Mathematical Algorithm Design for Deep Learning under Societal and
  Judicial Constraints: The Algorithmic Transparency Requirement
Mathematical Algorithm Design for Deep Learning under Societal and Judicial Constraints: The Algorithmic Transparency Requirement
Holger Boche
Adalbert Fono
Gitta Kutyniok
FaML
23
4
0
18 Jan 2024
AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on
  Large Language Models
AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models
Dong Shu
Mingyu Jin
Suiyuan Zhu
Beichen Wang
Zihao Zhou
Chong Zhang
Yongfeng Zhang
ELM
37
12
0
17 Jan 2024
Rethinking Benchmark and Contamination for Language Models with
  Rephrased Samples
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples
Shuo Yang
Wei-Lin Chiang
Lianmin Zheng
Joseph E. Gonzalez
Ion Stoica
ALM
6
109
0
08 Nov 2023
BC4LLM: Trusted Artificial Intelligence When Blockchain Meets Large
  Language Models
BC4LLM: Trusted Artificial Intelligence When Blockchain Meets Large Language Models
Haoxiang Luo
Jian Luo
Athanasios V. Vasilakos
16
9
0
10 Oct 2023
Previous
123
Next