ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.08592
  4. Cited By
AART: AI-Assisted Red-Teaming with Diverse Data Generation for New
  LLM-powered Applications

AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications

14 November 2023
Bhaktipriya Radharapu
Kevin Robinson
Lora Aroyo
Preethi Lahoti
ArXivPDFHTML

Papers citing "AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications"

32 / 32 papers shown
Title
$\texttt{SAGE}$: A Generic Framework for LLM Safety Evaluation
SAGE\texttt{SAGE}SAGE: A Generic Framework for LLM Safety Evaluation
Madhur Jindal
Hari Shrawgi
Parag Agrawal
Sandipan Dandapat
ELM
47
0
0
28 Apr 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
X. Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Yu Jiang
ALM
ELM
84
0
0
26 Apr 2025
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
RAG LLMs are Not Safer: A Safety Analysis of Retrieval-Augmented Generation for Large Language Models
Bang An
Shiyue Zhang
Mark Dredze
56
0
0
25 Apr 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
120
0
0
03 Mar 2025
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Giulio Zizzo
Giandomenico Cornacchia
Kieran Fraser
Muhammad Zaid Hameed
Ambrish Rawat
Beat Buesser
Mark Purcell
Pin-Yu Chen
P. Sattigeri
Kush R. Varshney
AAML
43
1
0
24 Feb 2025
Diversity Helps Jailbreak Large Language Models
Diversity Helps Jailbreak Large Language Models
Weiliang Zhao
Daniel Ben-Levi
Wei Hao
Junfeng Yang
Chengzhi Mao
AAML
99
0
0
06 Nov 2024
AURA: Amplifying Understanding, Resilience, and Awareness for
  Responsible AI Content Work
AURA: Amplifying Understanding, Resilience, and Awareness for Responsible AI Content Work
Alice Qian Zhang
Judith Amores
Mary L. Gray
Mary Czerwinski
J. Suh
40
0
0
03 Nov 2024
Active Learning for Robust and Representative LLM Generation in
  Safety-Critical Scenarios
Active Learning for Robust and Representative LLM Generation in Safety-Critical Scenarios
Sabit Hassan
Anthony Sicilia
Malihe Alikhani
26
2
0
14 Oct 2024
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in
  Red Teaming GenAI
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI
Ambrish Rawat
Stefan Schoepf
Giulio Zizzo
Giandomenico Cornacchia
Muhammad Zaid Hameed
...
Elizabeth M. Daly
Mark Purcell
P. Sattigeri
Pin-Yu Chen
Kush R. Varshney
AAML
40
7
0
23 Sep 2024
Debiasing Text Safety Classifiers through a Fairness-Aware Ensemble
Debiasing Text Safety Classifiers through a Fairness-Aware Ensemble
Olivia Sturman
Aparna Joshi
Bhaktipriya Radharapu
Piyush Kumar
Renee Shelby
22
1
0
05 Sep 2024
SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and
  Red Teaming
SAGE-RT: Synthetic Alignment data Generation for Safety Evaluation and Red Teaming
Anurakt Kumar
Divyanshu Kumar
Jatan Loya
Nitin Aravind Birur
Tanay Baswa
Sahil Agarwal
P. Harshangi
SyDa
39
5
0
14 Aug 2024
ShieldGemma: Generative AI Content Moderation Based on Gemma
ShieldGemma: Generative AI Content Moderation Based on Gemma
Wenjun Zeng
Yuchi Liu
Ryan Mullins
Ludovic Peran
Joe Fernandez
...
Drew Proud
Piyush Kumar
Bhaktipriya Radharapu
Olivia Sturman
O. Wahltinez
AI4MH
29
35
0
31 Jul 2024
PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing
PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing
Blazej Manczak
Eliott Zemour
Eric Lin
Vaikkunth Mugunthan
26
2
0
23 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models
  (LLMs)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
47
9
0
20 Jul 2024
How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine
  Studies
How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine Studies
Alina Leidinger
Richard Rogers
32
5
0
16 Jul 2024
Automated Adversarial Discovery for Safety Classifiers
Automated Adversarial Discovery for Safety Classifiers
Yash Kumar Lal
Preethi Lahoti
Aradhana Sinha
Yao Qin
Ananth Balashankar
33
0
0
24 Jun 2024
Guardrails for avoiding harmful medical product recommendations and
  off-label promotion in generative AI models
Guardrails for avoiding harmful medical product recommendations and off-label promotion in generative AI models
Daniel Lopez-Martinez
MedIm
38
1
0
24 Jun 2024
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang
Xiaoyuan Yi
Zhihua Wei
Shu Wang
Xing Xie
Xing Xie
ALM
ELM
50
5
0
20 Jun 2024
STAR: SocioTechnical Approach to Red Teaming Language Models
STAR: SocioTechnical Approach to Red Teaming Language Models
Laura Weidinger
John F. J. Mellor
Bernat Guillen Pegueroles
Nahema Marchal
Ravin Kumar
...
Mark Diaz
Stevie Bergman
Mikel Rodriguez
Verena Rieser
William S. Isaac
VLM
26
7
0
17 Jun 2024
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
Xiaoying Zhang
Baolin Peng
Ye Tian
Jingyan Zhou
Yipeng Zhang
Haitao Mi
H. Meng
CLL
KELM
71
5
0
10 Jun 2024
MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story
  Generation
MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation
Yan Ma
Yu Qiao
Pengfei Liu
32
5
0
09 Jun 2024
CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks
CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks
Maciej Besta
Lorenzo Paleari
Aleš Kubíček
Piotr Nyczyk
Robert Gerstenberger
Patrick Iff
Tomasz Lehmann
H. Niewiadomski
Torsten Hoefler
54
5
0
04 Jun 2024
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation
  for Generative Large Language Models
ConSiDERS-The-Human Evaluation Framework: Rethinking Human Evaluation for Generative Large Language Models
Aparna Elangovan
Ling Liu
Lei Xu
S. Bodapati
Dan Roth
ELM
19
9
0
28 May 2024
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging
  LLMs' (Lack of) Multicultural Knowledge
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge
Yu Ying Chiu
Amirhossein Ajalloeian
Maria Antoniak
Chan Young Park
Shuyue Stella Li
Mehar Bhatia
Sahithya Ravi
Yulia Tsvetkov
Vered Shwartz
Yejin Choi
36
20
0
10 Apr 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
58
30
0
08 Apr 2024
LexC-Gen: Generating Data for Extremely Low-Resource Languages with
  Large Language Models and Bilingual Lexicons
LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons
Zheng-Xin Yong
Cristina Menghini
Stephen H. Bach
36
3
0
21 Feb 2024
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
AAML
30
66
0
29 Jan 2024
Privacy in Large Language Models: Attacks, Defenses and Future
  Directions
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
38
40
0
16 Oct 2023
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in
  Large Language Models
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
Paul Röttger
Hannah Rose Kirk
Bertie Vidgen
Giuseppe Attanasio
Federico Bianchi
Dirk Hovy
ALM
ELM
AILaw
21
122
0
02 Aug 2023
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
500
0
28 Sep 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
293
4,048
0
24 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,217
0
21 Mar 2022
1