ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.09674
  4. Cited By
PAL: Proxy-Guided Black-Box Attack on Large Language Models

PAL: Proxy-Guided Black-Box Attack on Large Language Models

15 February 2024
Chawin Sitawarin
Norman Mu
David A. Wagner
Alexandre Araujo
    ELM
ArXivPDFHTML

Papers citing "PAL: Proxy-Guided Black-Box Attack on Large Language Models"

10 / 10 papers shown
Title
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey
Adversarial Attacks in Multimodal Systems: A Practitioner's Survey
Shashank Kapoor
Sanjay Surendranath Girija
Lakshit Arora
Dipen Pradhan
Ankit Shetgaonkar
Aman Raj
AAML
65
0
0
06 May 2025
RedPajama: an Open Dataset for Training Large Language Models
RedPajama: an Open Dataset for Training Large Language Models
Maurice Weber
Daniel Y. Fu
Quentin Anthony
Yonatan Oren
S. Adams
...
Tri Dao
Percy Liang
Christopher Ré
Irina Rish
Ce Zhang
96
52
0
19 Nov 2024
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
Deciphering the Chaos: Enhancing Jailbreak Attacks via Adversarial Prompt Translation
Qizhang Li
Xiaochen Yang
W. Zuo
Yiwen Guo
AAML
59
0
0
15 Oct 2024
Does Refusal Training in LLMs Generalize to the Past Tense?
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko
Nicolas Flammarion
42
27
0
16 Jul 2024
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical Manner
Xunguang Wang
Daoyuan Wu
Zhenlan Ji
Zongjie Li
Pingchuan Ma
Shuai Wang
Yingjiu Li
Yang Liu
Ning Liu
Juergen Rahmel
AAML
71
8
0
08 Jun 2024
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks
Maksym Andriushchenko
Francesco Croce
Nicolas Flammarion
AAML
81
157
0
02 Apr 2024
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
110
292
0
19 Sep 2023
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
225
495
0
28 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
245
1,977
0
31 Dec 2020
1