Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.13926
Cited By
Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content
21 February 2024
Federico Bianchi
James Y. Zou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content"
5 / 5 papers shown
Title
Decoding Hate: Exploring Language Models' Reactions to Hate Speech
Paloma Piot
Javier Parapar
43
1
0
01 Oct 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
47
8
0
20 Jul 2024
Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature
Tong Zhou
Xuandong Zhao
Xiaolin Xu
Shaolei Ren
27
6
0
04 Jun 2024
On the Risk of Misinformation Pollution with Large Language Models
Yikang Pan
Liangming Pan
Wenhu Chen
Preslav Nakov
Min-Yen Kan
W. Wang
DeLMO
190
105
0
23 May 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
1