The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches

5 June 2024

Papers citing "The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches"

12 / 12 papers shown

Title
Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts Hongyu Chen Seraphina Goldfarb-Tarrant 45 0 0 12 Mar 2025
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap Gopi Krishnan Rajbahadur G. Oliva Dayi Lin Ahmed E. Hassan 39 0 0 28 Jan 2025
A review of faithfulness metrics for hallucination assessment in Large Language Models Ben Malin Tatiana Kalganova Nikoloas Boulgouris HILM 59 2 0 03 Jan 2025
Generating a Low-code Complete Workflow via Task Decomposition and RAG Orlando Marquez Ayala Patrice Béchard 60 1 0 29 Nov 2024
Is artificial intelligence still intelligence? LLMs generalize to novel adjective-noun pairs, but don't mimic the full human distribution Hayley Ross Kathryn Davidson Najoung Kim 21 2 0 23 Oct 2024
A Cross-Lingual Statutory Article Retrieval Dataset for Taiwan Legal Studies Yen-Hsiang Wang Feng-Dian Su Tzu-Yu Yeh Yao-Chung Fan RALM AILaw 11 0 0 15 Oct 2024
Conversate: Supporting Reflective Learning in Interview Practice Through Interactive Simulation and Dialogic Feedback Taufiq Daryanto Xiaohan Ding Lance T Wilhelm Sophia Stil Kirk McInnis Knutsen Eugenia H. Rho 13 0 0 08 Oct 2024
Comparing Criteria Development Across Domain Experts, Lay Users, and Models in Large Language Model Evaluation Annalisa Szymanski Simret Araya Gebreegziabher Oghenemaro Anuyah Ronald A Metoyer T. Li ALM ELM 27 6 0 02 Oct 2024
Retrospective Comparative Analysis of Prostate Cancer In-Basket Messages: Responses from Closed-Domain LLM vs. Clinical Teams Yuexing Hao J. Holmes Jared Hobson Alexandra Bennett Daniel K. Ebner ... N. Yu Chris L. Hallemeier Brooke E. Ball Mark R. Waddle Wei Liu LM&MA 30 0 0 26 Sep 2024
Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue Jonathan Ivey Shivani Kumar Jiayu Liu Hua Shen Sushrita Rakshit ... Dustin Wright Abraham Israeli Anders Giovanni Møller Lechen Zhang David Jurgens 47 3 0 12 Sep 2024
Language agents achieve superhuman synthesis of scientific knowledge Michael D. Skarlinski Sam Cox Jon M. Laurent James D. Braza Michaela M. Hinks M. Hammerling Manvitha Ponnapati Samuel G. Rodriques Andrew D. White ELM HILM ALM 18 28 0 10 Sep 2024
With Little Power Comes Great Responsibility Dallas Card Peter Henderson Urvashi Khandelwal Robin Jia Kyle Mahowald Dan Jurafsky 225 115 0 13 Oct 2020