Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.03339
Cited By
The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches
5 June 2024
Bhashithe Abeysinghe
Ruhan Circi
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches"
12 / 12 papers shown
Title
Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts
Hongyu Chen
Seraphina Goldfarb-Tarrant
45
0
0
12 Mar 2025
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
Gopi Krishnan Rajbahadur
G. Oliva
Dayi Lin
Ahmed E. Hassan
39
0
0
28 Jan 2025
A review of faithfulness metrics for hallucination assessment in Large Language Models
Ben Malin
Tatiana Kalganova
Nikoloas Boulgouris
HILM
59
2
0
03 Jan 2025
Generating a Low-code Complete Workflow via Task Decomposition and RAG
Orlando Marquez Ayala
Patrice Béchard
60
1
0
29 Nov 2024
Is artificial intelligence still intelligence? LLMs generalize to novel adjective-noun pairs, but don't mimic the full human distribution
Hayley Ross
Kathryn Davidson
Najoung Kim
21
2
0
23 Oct 2024
A Cross-Lingual Statutory Article Retrieval Dataset for Taiwan Legal Studies
Yen-Hsiang Wang
Feng-Dian Su
Tzu-Yu Yeh
Yao-Chung Fan
RALM
AILaw
11
0
0
15 Oct 2024
Conversate: Supporting Reflective Learning in Interview Practice Through Interactive Simulation and Dialogic Feedback
Taufiq Daryanto
Xiaohan Ding
Lance T Wilhelm
Sophia Stil
Kirk McInnis Knutsen
Eugenia H. Rho
13
0
0
08 Oct 2024
Comparing Criteria Development Across Domain Experts, Lay Users, and Models in Large Language Model Evaluation
Annalisa Szymanski
Simret Araya Gebreegziabher
Oghenemaro Anuyah
Ronald A Metoyer
T. Li
ALM
ELM
27
6
0
02 Oct 2024
Retrospective Comparative Analysis of Prostate Cancer In-Basket Messages: Responses from Closed-Domain LLM vs. Clinical Teams
Yuexing Hao
J. Holmes
Jared Hobson
Alexandra Bennett
Daniel K. Ebner
...
N. Yu
Chris L. Hallemeier
Brooke E. Ball
Mark R. Waddle
Wei Liu
LM&MA
30
0
0
26 Sep 2024
Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue
Jonathan Ivey
Shivani Kumar
Jiayu Liu
Hua Shen
Sushrita Rakshit
...
Dustin Wright
Abraham Israeli
Anders Giovanni Møller
Lechen Zhang
David Jurgens
47
3
0
12 Sep 2024
Language agents achieve superhuman synthesis of scientific knowledge
Michael D. Skarlinski
Sam Cox
Jon M. Laurent
James D. Braza
Michaela M. Hinks
M. Hammerling
Manvitha Ponnapati
Samuel G. Rodriques
Andrew D. White
ELM
HILM
ALM
18
28
0
10 Sep 2024
With Little Power Comes Great Responsibility
Dallas Card
Peter Henderson
Urvashi Khandelwal
Robin Jia
Kyle Mahowald
Dan Jurafsky
225
115
0
13 Oct 2020
1