ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.03109
  4. Cited By
A Survey on Evaluation of Large Language Models

A Survey on Evaluation of Large Language Models

6 July 2023
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
Kaijie Zhu
Hao Chen
Xiaoyuan Yi
Cunxiang Wang
Yidong Wang
Weirong Ye
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
    ELM
    LM&MA
    ALM
ArXivPDFHTML

Papers citing "A Survey on Evaluation of Large Language Models"

50 / 124 papers shown
Title
GenAI in Entrepreneurship: a systematic review of generative artificial intelligence in entrepreneurship research: current issues and future directions
GenAI in Entrepreneurship: a systematic review of generative artificial intelligence in entrepreneurship research: current issues and future directions
Anna Kusetogullari
Huseyin Kusetogullari
Martin Andersson
Tony Gorschek
12
0
0
08 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
57
0
0
05 May 2025
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Linus Nwankwo
Bjoern Ellensohn
Ozan Özdenizci
Elmar Rueckert
LM&Ro
42
0
0
03 May 2025
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
G. Wang
Z. Chen
Bo Li
Haifeng Xu
23
0
0
02 May 2025
UserCentrix: An Agentic Memory-augmented AI Framework for Smart Spaces
UserCentrix: An Agentic Memory-augmented AI Framework for Smart Spaces
Alaa Saleh
Sasu Tarkoma
Praveen Kumar Donta
Naser Hossein Motlagh
Schahram Dustdar
Susanna Pirttikangas
Lauri Lovén
39
0
0
01 May 2025
Robotic Visual Instruction
Robotic Visual Instruction
Y. Li
Ziyang Gong
H. Li
Xiaoqi Huang
Haolan Kang
Guangping Bai
Xianzheng Ma
LM&Ro
66
0
0
01 May 2025
An Automated Reinforcement Learning Reward Design Framework with Large Language Model for Cooperative Platoon Coordination
An Automated Reinforcement Learning Reward Design Framework with Large Language Model for Cooperative Platoon Coordination
Dixiao Wei
Peng Yi
Jinlong Lei
Yiguang Hong
Yuchuan Du
33
0
0
28 Apr 2025
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Jiageng Wu
Bowen Gu
Ren Zhou
Kevin Xie
Doug Snyder
...
S.
Jonathan H. Chen
Santiago Romero-Brufau
K. J. Lin
Jie Yang
LM&MA
ELM
92
0
0
28 Apr 2025
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Toghrul Abbasli
Kentaroh Toyoda
Yuan Wang
Leon Witt
Muhammad Asif Ali
Yukai Miao
Dan Li
Qingsong Wei
UQCV
79
0
0
25 Apr 2025
Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL
Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL
Simone Papicchio
Simone Rossi
Luca Cagliero
Paolo Papotti
ReLM
LMTD
AI4TS
LRM
51
0
0
21 Apr 2025
CPG-EVAL: A Multi-Tiered Benchmark for Evaluating the Chinese Pedagogical Grammar Competence of Large Language Models
CPG-EVAL: A Multi-Tiered Benchmark for Evaluating the Chinese Pedagogical Grammar Competence of Large Language Models
Dong Wang
ELM
19
0
0
17 Apr 2025
Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups
Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups
Rijul Magu
Arka Dutta
Sean Kim
Ashiqur R. KhudaBukhsh
Munmun De Choudhury
19
0
0
08 Apr 2025
Exploring Generative AI Techniques in Government: A Case Study
Exploring Generative AI Techniques in Government: A Case Study
Sunyi Liu
Mengzhe Geng
Rebecca Hart
LLMAG
31
0
0
06 Apr 2025
Engineering Artificial Intelligence: Framework, Challenges, and Future Direction
Engineering Artificial Intelligence: Framework, Challenges, and Future Direction
Jay Lee
Hanqi Su
Dai-Yan Ji
Takanobu Minami
AI4CE
46
0
0
03 Apr 2025
Rethinking industrial artificial intelligence: a unified foundation framework
Rethinking industrial artificial intelligence: a unified foundation framework
Jay Lee
Hanqi Su
AI4CE
33
1
0
02 Apr 2025
Communication-Efficient and Personalized Federated Foundation Model Fine-Tuning via Tri-Matrix Adaptation
Communication-Efficient and Personalized Federated Foundation Model Fine-Tuning via Tri-Matrix Adaptation
Y. Li
Bo Liu
Sheng Huang
Z. Zhang
Xiaotong Yuan
Richang Hong
32
0
0
31 Mar 2025
A Scalable Framework for Evaluating Health Language Models
A Scalable Framework for Evaluating Health Language Models
Neil Mallinar
A. Heydari
Xin Liu
Anthony Z. Faranesh
Brent Winslow
...
Mark Malhotra
Shwetak N. Patel
Javier L. Prieto
Daniel J. McDuff
Ahmed A. Metwally
LM&MA
46
1
0
30 Mar 2025
An evaluation of LLMs and Google Translate for translation of selected Indian languages via sentiment and semantic analyses
An evaluation of LLMs and Google Translate for translation of selected Indian languages via sentiment and semantic analyses
Rohitash Chandra
Aryan Chaudhary
Yeshwanth Rayavarapu
42
0
0
27 Mar 2025
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding
Hao Wu
Y. Yang
Shiqi Jiang
Donglin Bai
Zhibo Chen
Ting Cao
40
0
0
08 Mar 2025
The Effectiveness of Large Language Models in Transforming Unstructured Text to Standardized Formats
The Effectiveness of Large Language Models in Transforming Unstructured Text to Standardized Formats
William Brach
Kristián Košťál
Michal Ries
90
0
0
04 Mar 2025
Mapping Trustworthiness in Large Language Models: A Bibliometric Analysis Bridging Theory to Practice
Mapping Trustworthiness in Large Language Models: A Bibliometric Analysis Bridging Theory to Practice
José Antonio Siqueira de Cerqueira
Kai-Kristian Kemell
Muhammad Waseem
Rebekah A. Rousi
Nannan Xi
Juho Hamari
50
3
0
27 Feb 2025
Generative Artificial Intelligence: Evolving Technology, Growing Societal Impact, and Opportunities for Information Systems Research
Veda C. Storey
Wei Thoo Yue
J. Leon Zhao
Roman Lukyanenko
38
0
0
25 Feb 2025
Evaluating Large Language Models for Public Health Classification and Extraction Tasks
Evaluating Large Language Models for Public Health Classification and Extraction Tasks
Joshua Harris
Timothy Laurence
Leo Loman
Fan Grayson
Toby Nonnenmacher
...
Hamish Mohammed
Thomas Finnie
Luke Hounsome
Michael Borowitz
Steven Riley
LM&MA
AI4MH
79
5
0
20 Feb 2025
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
Dong Wang
Haris Šikić
Lothar Thiele
O. Saukh
40
0
0
17 Feb 2025
Efficient Evaluation of Multi-Task Robot Policies With Active Experiment Selection
Efficient Evaluation of Multi-Task Robot Policies With Active Experiment Selection
Abrar Anwar
Rohan Gupta
Zain Merchant
Sayan Ghosh
Willie Neiswanger
Jesse Thomason
OffRL
62
0
0
14 Feb 2025
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
Mo Yu
Lemao Liu
J. Wu
Tsz Ting Chung
Shunchi Zhang
JiangNan Li
Dit-Yan Yeung
Jie Zhou
77
1
0
13 Feb 2025
Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning
Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning
Peizhuang Cong
Wenpu Liu
Wenhan Yu
Haochen Zhao
Tong Yang
ALM
MoE
72
0
0
06 Feb 2025
Large Language Models as Common-Sense Heuristics
Large Language Models as Common-Sense Heuristics
Andrey Borro
Patricia J. Riddle
Michael W Barley
Michael Witbrock
LRM
LM&Ro
126
1
0
31 Jan 2025
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Shahin Honarvar
Mark van der Wilk
Alastair Donaldson
74
6
0
28 Jan 2025
Explaining Decisions of Agents in Mixed-Motive Games
Explaining Decisions of Agents in Mixed-Motive Games
Maayan Orner
Oleg Maksimov
Akiva Kleinerman
Charles Ortiz
Sarit Kraus
83
0
0
28 Jan 2025
Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
H. Zhang
Xiaoman Pan
Hongwei Wang
Kaixin Ma
W. Yu
Dong Yu
LLMAG
46
3
0
03 Jan 2025
Simulating Human-like Daily Activities with Desire-driven Autonomy
Simulating Human-like Daily Activities with Desire-driven Autonomy
Yiding Wang
Yuxuan Chen
Fangwei Zhong
Long Ma
Yizhou Wang
60
2
0
09 Dec 2024
Seamless Optical Cloud Computing across Edge-Metro Network for Generative AI
Seamless Optical Cloud Computing across Edge-Metro Network for Generative AI
Sizhe Xing
Aolong Sun
Chengxi Wang
Yizhi Wang
Boyu Dong
...
Xi Xiao
R. Penty
Qixiang Cheng
Nan Chi
Junwen Zhang
106
0
0
04 Dec 2024
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
Suhas S Kowshik
Abhishek Divekar
Vijit Malik
SyDa
33
0
0
13 Nov 2024
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Elia Cunegatti
Leonardo Lucio Custode
Giovanni Iacca
36
0
0
11 Nov 2024
A Deep Dive Into Large Language Model Code Generation Mistakes: What and Why?
A Deep Dive Into Large Language Model Code Generation Mistakes: What and Why?
QiHong Chen
Jiawei Li
Jiecheng Deng
Jiachen Yu
Justin Tian Jin Chen
Iftekhar Ahmed
38
0
0
03 Nov 2024
Interacting Large Language Model Agents. Interpretable Models and Social
  Learning
Interacting Large Language Model Agents. Interpretable Models and Social Learning
Adit Jain
Vikram Krishnamurthy
LLMAG
25
0
0
02 Nov 2024
PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection
PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection
Tianhao Zhang
Zhixiang Chen
Lyudmila Mihaylova
26
0
0
27 Oct 2024
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
H. Zhang
Hongfu Gao
Qiang Hu
Guanhua Chen
L. Yang
Bingyi Jing
Hongxin Wei
Bing Wang
Haifeng Bai
Lei Yang
AILaw
ELM
43
1
0
24 Oct 2024
Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations
  Benchmark for Better Human-Machine Comparison
Can LVLMs Describe Videos like Humans? A Five-in-One Video Annotations Benchmark for Better Human-Machine Comparison
Shiyu Hu
Xuchen Li
X. Li
Jing Zhang
Yipei Wang
Xin Zhao
Kang Hao Cheong
VLM
17
1
0
20 Oct 2024
Unveiling Large Language Models Generated Texts: A Multi-Level
  Fine-Grained Detection Framework
Unveiling Large Language Models Generated Texts: A Multi-Level Fine-Grained Detection Framework
Zhen Tao
Zhiyu Li
Runyu Chen
Dinghao Xi
Wei Xu
DeLMO
9
1
0
18 Oct 2024
Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs
Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs
Wanying Wang
Zeyu Ma
Pengfei Liu
Mingang Chen
LLMAG
43
1
0
15 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
36
3
0
08 Oct 2024
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
Qiyuan Zhang
Yufei Wang
Tiezheng YU
Yuxin Jiang
Chuhan Wu
...
Xin Jiang
Lifeng Shang
Ruiming Tang
Fuyuan Lyu
Chen Ma
26
4
0
07 Oct 2024
A Survey on Point-of-Interest Recommendation: Models, Architectures, and Security
A Survey on Point-of-Interest Recommendation: Models, Architectures, and Security
Qianru Zhang
Peng Yang
Junliang Yu
Haixin Wang
Xingwei He
S. Yiu
Hongzhi Yin
21
0
0
03 Oct 2024
Undesirable Memorization in Large Language Models: A Survey
Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty
Suzan Verberne
Fatih Turkmen
ELM
PILM
60
7
0
03 Oct 2024
Reasoning Elicitation in Language Models via Counterfactual Feedback
Reasoning Elicitation in Language Models via Counterfactual Feedback
Alihan Hüyük
Xinnuo Xu
Jacqueline Maasch
Aditya V. Nori
Javier González
ReLM
LRM
44
1
0
02 Oct 2024
Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models
Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models
Joseph Lee
Shu Yang
Jae Young Baik
Xiaoxi Liu
Zhen Tan
...
Zixuan Wen
Bojian Hou
D. Duong-Tran
Tianlong Chen
Li Shen
42
1
0
02 Oct 2024
Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales
Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales
Maor Reuben
Ortal Slobodin
Aviad Elyshar
Idan-Chaim Cohen
Orna Braun-Lewensohn
Odeya Cohen
Rami Puzis
28
0
0
29 Sep 2024
Recent Advances in OOD Detection: Problems and Approaches
Recent Advances in OOD Detection: Problems and Approaches
Shuo Lu
YingSheng Wang
Lijun Sheng
Aihua Zheng
Lingxiao He
Jian Liang
OODD
37
2
0
18 Sep 2024
123
Next