ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.03339
  4. Cited By
The Challenges of Evaluating LLM Applications: An Analysis of Automated,
  Human, and LLM-Based Approaches

The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches

5 June 2024
Bhashithe Abeysinghe
Ruhan Circi
    ELM
ArXivPDFHTML

Papers citing "The Challenges of Evaluating LLM Applications: An Analysis of Automated, Human, and LLM-Based Approaches"

12 / 12 papers shown
Title
Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts
Hongyu Chen
Seraphina Goldfarb-Tarrant
45
0
0
12 Mar 2025
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
Gopi Krishnan Rajbahadur
G. Oliva
Dayi Lin
Ahmed E. Hassan
39
0
0
28 Jan 2025
A review of faithfulness metrics for hallucination assessment in Large Language Models
Ben Malin
Tatiana Kalganova
Nikoloas Boulgouris
HILM
59
2
0
03 Jan 2025
Generating a Low-code Complete Workflow via Task Decomposition and RAG
Orlando Marquez Ayala
Patrice Béchard
60
1
0
29 Nov 2024
Is artificial intelligence still intelligence? LLMs generalize to novel
  adjective-noun pairs, but don't mimic the full human distribution
Is artificial intelligence still intelligence? LLMs generalize to novel adjective-noun pairs, but don't mimic the full human distribution
Hayley Ross
Kathryn Davidson
Najoung Kim
21
0
0
23 Oct 2024
A Cross-Lingual Statutory Article Retrieval Dataset for Taiwan Legal
  Studies
A Cross-Lingual Statutory Article Retrieval Dataset for Taiwan Legal Studies
Yen-Hsiang Wang
Feng-Dian Su
Tzu-Yu Yeh
Yao-Chung Fan
RALM
AILaw
11
0
0
15 Oct 2024
Conversate: Supporting Reflective Learning in Interview Practice Through
  Interactive Simulation and Dialogic Feedback
Conversate: Supporting Reflective Learning in Interview Practice Through Interactive Simulation and Dialogic Feedback
Taufiq Daryanto
Xiaohan Ding
Lance T Wilhelm
Sophia Stil
Kirk McInnis Knutsen
Eugenia H. Rho
13
0
0
08 Oct 2024
Comparing Criteria Development Across Domain Experts, Lay Users, and
  Models in Large Language Model Evaluation
Comparing Criteria Development Across Domain Experts, Lay Users, and Models in Large Language Model Evaluation
Annalisa Szymanski
Simret Araya Gebreegziabher
Oghenemaro Anuyah
Ronald A Metoyer
T. Li
ALM
ELM
27
6
0
02 Oct 2024
Retrospective Comparative Analysis of Prostate Cancer In-Basket
  Messages: Responses from Closed-Domain LLM vs. Clinical Teams
Retrospective Comparative Analysis of Prostate Cancer In-Basket Messages: Responses from Closed-Domain LLM vs. Clinical Teams
Yuexing Hao
J. Holmes
Jared Hobson
Alexandra Bennett
Daniel K. Ebner
...
N. Yu
Chris L. Hallemeier
Brooke E. Ball
Mark R. Waddle
Wei Liu
LM&MA
27
0
0
26 Sep 2024
Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of
  Human Responses in Dialogue
Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue
Jonathan Ivey
Shivani Kumar
Jiayu Liu
Hua Shen
Sushrita Rakshit
...
Dustin Wright
Abraham Israeli
Anders Giovanni Møller
Lechen Zhang
David Jurgens
47
3
0
12 Sep 2024
Language agents achieve superhuman synthesis of scientific knowledge
Language agents achieve superhuman synthesis of scientific knowledge
Michael D. Skarlinski
Sam Cox
Jon M. Laurent
James D. Braza
Michaela M. Hinks
M. Hammerling
Manvitha Ponnapati
Samuel G. Rodriques
Andrew D. White
ELM
HILM
ALM
18
28
0
10 Sep 2024
With Little Power Comes Great Responsibility
With Little Power Comes Great Responsibility
Dallas Card
Peter Henderson
Urvashi Khandelwal
Robin Jia
Kyle Mahowald
Dan Jurafsky
225
115
0
13 Oct 2020
1