Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.09473
Cited By
JAB: Joint Adversarial Prompting and Belief Augmentation
16 November 2023
Ninareh Mehrabi
Palash Goyal
Anil Ramakrishna
Jwala Dhamala
Shalini Ghosh
Richard Zemel
Kai-Wei Chang
Aram Galstyan
Rahul Gupta
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"JAB: Joint Adversarial Prompting and Belief Augmentation"
8 / 8 papers shown
Title
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
47
8
0
20 Jul 2024
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
Raghuveer Peri
Sai Muralidhar Jayanthi
S. Ronanki
Anshu Bhatia
Karel Mundnich
...
Srikanth Vishnubhotla
Daniel Garcia-Romero
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
AAML
32
3
0
14 May 2024
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
Michael Feffer
Anusha Sinha
Wesley Hanwen Deng
Zachary Chase Lipton
Hoda Heidari
AAML
25
66
0
29 Jan 2024
SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks
Mengsay Loem
Masahiro Kaneko
Naoaki Okazaki
LRM
19
5
0
14 Nov 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
Analyzing Dynamic Adversarial Training Data in the Limit
Eric Wallace
Adina Williams
Robin Jia
Douwe Kiela
184
29
0
16 Oct 2021
The Woman Worked as a Babysitter: On Biases in Language Generation
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
206
607
0
03 Sep 2019
1