ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.05778
  4. Cited By
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language
  Model Systems

Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

11 January 2024
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
Xinhao Deng
Yunpeng Liu
Qinglin Zhang
Ziyi Qiu
Peiyang Li
Zhixing Tan
Junwu Xiong
Xinyu Kong
Zujie Wen
Ke Xu
Qi Li
ArXivPDFHTML

Papers citing "Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems"

35 / 35 papers shown
Title
Attack and defense techniques in large language models: A survey and new perspectives
Attack and defense techniques in large language models: A survey and new perspectives
Zhiyu Liao
Kang Chen
Yuanguo Lin
Kangkang Li
Yunxuan Liu
Hefeng Chen
Xingwang Huang
Yuanhui Yu
AAML
52
0
0
02 May 2025
Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation
Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation
Qianren Mao
Qili Zhang
Hanwen Hao
Zhentao Han
Runhua Xu
...
Bo Li
Y. Song
Jin Dong
Jianxin Li
Philip S. Yu
61
0
0
27 Apr 2025
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks
Ang Li
Yin Zhou
Vethavikashini Chithrra Raghuram
Tom Goldstein
Micah Goldblum
AAML
46
7
0
12 Feb 2025
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Manan Suri
Puneet Mathur
Franck Dernoncourt
Kanika Goswami
Ryan Rossi
Dinesh Manocha
85
3
0
14 Dec 2024
Towards Understanding Sycophancy in Language Models
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
D. Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
207
178
0
20 Oct 2023
How Language Model Hallucinations Can Snowball
How Language Model Hallucinations Can Snowball
Muru Zhang
Ofir Press
William Merrill
Alisa Liu
Noah A. Smith
HILM
LRM
75
171
0
22 May 2023
Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate
  Speech Detection
Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection
Mithun Das
Saurabh Kumar Pandey
Animesh Mukherjee
33
10
0
22 May 2023
Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in
  Language Models
Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models
Shuai Zhao
Jinming Wen
Anh Tuan Luu
J. Zhao
Jie Fu
SILM
51
88
0
02 May 2023
ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox
  Generative Model Trigger
ChatGPT as an Attack Tool: Stealthy Textual Backdoor Attack via Blackbox Generative Model Trigger
Jiazhao Li
Yijin Yang
Zhuofeng Wu
V. Vydiswaran
Chaowei Xiao
SILM
33
27
0
27 Apr 2023
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang
Hongye Jin
Ruixiang Tang
Xiaotian Han
Qizhang Feng
Haoming Jiang
Bing Yin
Xia Hu
LM&MA
123
593
0
26 Apr 2023
The Internal State of an LLM Knows When It's Lying
The Internal State of an LLM Knows When It's Lying
A. Azaria
Tom Michael Mitchell
HILM
210
297
0
26 Apr 2023
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
  Generative Large Language Models
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
Potsawee Manakul
Adian Liusie
Mark J. F. Gales
HILM
LRM
145
386
0
15 Mar 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from
  Text Edits
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
31
29
0
01 Jan 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
208
2,413
0
06 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
237
840
0
05 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
217
495
0
28 Sep 2022
In-context Learning and Induction Heads
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
234
453
0
24 Sep 2022
Memorization in NLP Fine-tuning Methods
Memorization in NLP Fine-tuning Methods
Fatemehsadat Mireshghallah
Archit Uniyal
Tianhao Wang
David E. Evans
Taylor Berg-Kirkpatrick
AAML
50
29
0
25 May 2022
Toxicity Detection with Generative Prompt-based Inference
Toxicity Detection with Generative Prompt-based Inference
Yau-Shian Wang
Y. Chang
59
34
0
24 May 2022
"I'm sorry to hear that": Finding New Biases in Language Models with a
  Holistic Descriptor Dataset
"I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor Dataset
Eric Michael Smith
Melissa Hall
Melanie Kambadur
Eleonora Presani
Adina Williams
53
128
0
18 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
313
8,261
0
28 Jan 2022
Differentially Private Fine-tuning of Language Models
Differentially Private Fine-tuning of Language Models
Da Yu
Saurabh Naik
A. Backurs
Sivakanth Gopi
Huseyin A. Inan
...
Y. Lee
Andre Manoel
Lukas Wutschitz
Sergey Yekhanin
Huishuai Zhang
128
258
0
13 Oct 2021
Challenges in Detoxifying Language Models
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
236
191
0
15 Sep 2021
Can one hear the shape of a neural network?: Snooping the GPU via
  Magnetic Side Channel
Can one hear the shape of a neural network?: Snooping the GPU via Magnetic Side Channel
H. Maia
Chang Xiao
Dingzeyu Li
E. Grinspun
Changxi Zheng
AAML
18
27
0
15 Sep 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
234
447
0
14 Jul 2021
CryptGPU: Fast Privacy-Preserving Machine Learning on the GPU
CryptGPU: Fast Privacy-Preserving Machine Learning on the GPU
Sijun Tan
Brian Knott
Yuan Tian
David J. Wu
BDL
FedML
47
181
0
22 Apr 2021
A Token-level Reference-free Hallucination Detection Benchmark for
  Free-form Text Generation
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
Tianyu Liu
Yizhe Zhang
Chris Brockett
Yi Mao
Zhifang Sui
Weizhu Chen
W. Dolan
HILM
209
140
0
18 Apr 2021
Entity-level Factual Consistency of Abstractive Text Summarization
Entity-level Factual Consistency of Abstractive Text Summarization
Feng Nan
Ramesh Nallapati
Zhiguo Wang
Cicero Nogueira dos Santos
Henghui Zhu
Dejiao Zhang
Kathleen McKeown
Bing Xiang
HILM
134
156
0
18 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
236
1,508
0
31 Dec 2020
Extracting Training Data from Large Language Models
Extracting Training Data from Large Language Models
Nicholas Carlini
Florian Tramèr
Eric Wallace
Matthew Jagielski
Ariel Herbert-Voss
...
Tom B. Brown
D. Song
Ulfar Erlingsson
Alina Oprea
Colin Raffel
MLAU
SILM
261
1,386
0
14 Dec 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
220
3,054
0
23 Jan 2020
DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in
  Neural Networks
DeepSafe: A Data-driven Approach for Checking Adversarial Robustness in Neural Networks
D. Gopinath
Guy Katz
C. Păsăreanu
Clark W. Barrett
AAML
40
87
0
02 Oct 2017
Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks
Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks
Guy Katz
Clark W. Barrett
D. Dill
Kyle D. Julian
Mykel Kochenderfer
AAML
219
1,818
0
03 Feb 2017
1