Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.14578
Cited By
RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain
21 March 2024
William James Bolton
Rafael Poyiadzi
Edward R. Morrell
Gabriela van Bergen Gonzalez Bueno
Lea Goetz
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain"
6 / 6 papers shown
Title
Trusting CHATGPT: how minor tweaks in the prompts lead to major differences in sentiment classification
Jaime E. Cuellar
Oscar Moreno-Martinez
Paula Sofia Torres-Rodriguez
Jaime Andres Pavlich-Mariscal
Andres Felipe Mican-Castiblanco
Juan Guillermo Torres-Hurtado
23
0
0
16 Apr 2025
The Need for Guardrails with Large Language Models in Medical Safety-Critical Settings: An Artificial Intelligence Application in the Pharmacovigilance Ecosystem
Joe B Hakim
Jeffery L. Painter
D. Ramcharran
V. Kara
Greg Powell
Paulina Sobczak
Chiho Sato
Andrew Bate
Andrew Beam
20
2
0
01 Jul 2024
MedEval: A Multi-Level, Multi-Task, and Multi-Domain Medical Benchmark for Language Model Evaluation
Zexue He
Yu-Xiang Wang
An Yan
Yao Liu
Eric Y. Chang
Amilcare Gentili
Julian McAuley
Chun-Nan Hsu
ELM
54
14
0
21 Oct 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
209
559
0
03 May 2023
Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark
Nouha Dziri
Hannah Rashkin
Tal Linzen
David Reitter
ALM
185
79
0
30 Apr 2021
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
202
791
0
13 Sep 2019
1