ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.13926
  4. Cited By
Large Language Models are Vulnerable to Bait-and-Switch Attacks for
  Generating Harmful Content

Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content

21 February 2024
Federico Bianchi
James Y. Zou
ArXivPDFHTML

Papers citing "Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content"

5 / 5 papers shown
Title
Decoding Hate: Exploring Language Models' Reactions to Hate Speech
Decoding Hate: Exploring Language Models' Reactions to Hate Speech
Paloma Piot
Javier Parapar
43
1
0
01 Oct 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models
  (LLMs)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
47
8
0
20 Jul 2024
Bileve: Securing Text Provenance in Large Language Models Against
  Spoofing with Bi-level Signature
Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature
Tong Zhou
Xuandong Zhao
Xiaolin Xu
Shaolei Ren
27
6
0
04 Jun 2024
On the Risk of Misinformation Pollution with Large Language Models
On the Risk of Misinformation Pollution with Large Language Models
Yikang Pan
Liangming Pan
Wenhu Chen
Preslav Nakov
Min-Yen Kan
W. Wang
DeLMO
190
105
0
23 May 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
1