Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.14966
Cited By
Walking a Tightrope -- Evaluating Large Language Models in High-Risk Domains
25 November 2023
Chia-Chien Hung
Wiem Ben-Rim
Lindsay Frost
Lars Bruckner
Carolin (Haas) Lawrence
AILaw
ALM
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Walking a Tightrope -- Evaluating Large Language Models in High-Risk Domains"
14 / 14 papers shown
Title
Red Teaming Large Language Models for Healthcare
Vahid Balazadeh
Michael Cooper
David Pellow
Atousa Assadi
Jennifer Bell
...
Syed Ahmar Shah
Babak Taati
Balagopal Unnikrishnan
Stephanie Williams
Rahul G. Krishnan
LM&MA
26
0
0
01 May 2025
ScholarMate: A Mixed-Initiative Tool for Qualitative Knowledge Work and Information Sensemaking
Runlong Ye
Patrick Yung Kang Lee
Matthew Varona
Oliver Huang
Carolina Nobre
36
0
0
19 Apr 2025
Position: Standard Benchmarks Fail -- LLM Agents Present Overlooked Risks for Financial Applications
Zichen Chen
Jiaao Chen
Jianda Chen
Misha Sra
ELM
34
1
0
21 Feb 2025
SafeDialBench: A Fine-Grained Safety Benchmark for Large Language Models in Multi-Turn Dialogues with Diverse Jailbreak Attacks
Hongye Cao
Yanming Wang
Sijia Jing
Ziyue Peng
Zhixin Bai
...
Yang Gao
Fanyu Meng
Xi Yang
Chao Deng
Junlan Feng
AAML
41
0
0
16 Feb 2025
Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models
Hao Yang
Lizhen Qu
Ehsan Shareghi
Gholamreza Haffari
AAML
36
3
0
31 Oct 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
47
8
0
20 Jul 2024
MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs
Vera Neplenbroek
Arianna Bisazza
Raquel Fernández
29
6
0
11 Jun 2024
D-NLP at SemEval-2024 Task 2: Evaluating Clinical Inference Capabilities of Large Language Models
Duygu Altinok
AI4MH
LRM
LM&MA
ELM
21
1
0
07 May 2024
On the Limitations of Reference-Free Evaluations of Generated Text
Daniel Deutsch
Rotem Dror
Dan Roth
27
44
0
22 Oct 2022
Calibrating Factual Knowledge in Pretrained Language Models
Qingxiu Dong
Damai Dai
Yifan Song
Jingjing Xu
Zhifang Sui
Lei Li
KELM
225
81
0
07 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
103
91
0
06 Oct 2022
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
Peter Henderson
M. Krass
Lucia Zheng
Neel Guha
Christopher D. Manning
Dan Jurafsky
Daniel E. Ho
AILaw
ELM
129
94
0
01 Jul 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
202
791
0
13 Sep 2019
1